Perceptual audio coding
    1.
    发明授权
    Perceptual audio coding 有权
    感性音频编码

    公开(公告)号:US06704705B1

    公开(公告)日:2004-03-09

    申请号:US09146752

    申请日:1998-09-04

    CPC classification number: G10L19/032 G10L2019/0013

    Abstract: A method and apparatus for perceptual audio coding. The method and apparatus provide high-quality sound for coding rates down to and below 1 bit/sample for a wide variety of input signals including speech, music and background noise. The invention provides a new distortion measure for coding the input speech and training the codebooks, where the distortion measure is based on a masking spectrum of the input frequency spectrum. The invention also provides a method for direct calculation of masking thresholds from a modified discrete cosine transform of the input signal. The invention also provides a predictive and non-predictive vector quantizer for determining the energy of the coefficients representing the frequency spectrum. As well, the invention provides a split vector quantizer for quantizing the fine structure of coefficients representing the frequency spectrum. Bit allocation for the split vector quantizer is based on the masking threshold. The split vector quantizer also makes use of embedded codebooks. Furthermore, the invention makes use of a new transient detection method for selection of input windows.

    Abstract translation: 一种用于感知音频编码的方法和装置。 该方法和装置为包括语音,音乐和背景噪声在内的多种输入信号提供高达1位/以下的编码率的高质量声音。 本发明提供了一种用于编码输入语音并训练码本的新的失真度量,其中失真测量基于输入频谱的屏蔽频谱。 本发明还提供了一种用于从输入信号的修正离散余弦变换直接计算掩蔽阈值的方法。 本发明还提供了用于确定表示频谱的系数的能量的预测和非预测矢量量化器。 同样,本发明提供了用于量化表示频谱的系数的精细结构的分割矢量量化器。 分割矢量量化器的位分配基于掩蔽阈值。 分割矢量量化器也使用嵌入式码本。 此外,本发明利用用于选择输入窗口的新的瞬态检测方法。

    Method of detecting silence in a packetized voice stream
    2.
    发明授权
    Method of detecting silence in a packetized voice stream 失效
    检测分组语音流中的静音的方法

    公开(公告)号:US06535844B1

    公开(公告)日:2003-03-18

    申请号:US09580788

    申请日:2000-05-30

    CPC classification number: G10L25/78

    Abstract: A method and apparatus for detecting silence in voice packets. A packet energy calculator calculates a smoothed energy value for each packet of voice data to be transmitted. A noise level detector adaptively calculates noise values during periods of said silence. A silent packet detector compares the energy value to the noise value and if it is less than the noise value and less than a predetermined silence ceiling value then silence is indicated. Also, if the energy value is less than a predetermined silence noise value then silence is also indicated.

    Abstract translation: 一种用于检测语音分组中的静音的方法和装置。 分组能量计算器计算要发送的每个语音数据分组的平滑的能量值。 噪声电平检测器在所述静音期间自适应地计算噪声值。 静音分组检测器将能量值与噪声值进行比较,如果小于噪声值且小于预定的沉默上限值,则指示静音。 此外,如果能量值小于预定的静音噪声值,则也表示沉默。

    Process for removing voice from stereo recordings
    3.
    发明授权
    Process for removing voice from stereo recordings 有权
    从立体声录音中删除语音的过程

    公开(公告)号:US06405163B1

    公开(公告)日:2002-06-11

    申请号:US09405941

    申请日:1999-09-27

    Applicant: Jean Laroche

    Inventor: Jean Laroche

    CPC classification number: H04S5/005 H04S3/008 H04S2400/05

    Abstract: A method and apparatus for removing or amplifying voice or other signals panned to the center of a stereo recording utilizes frequency domain techniques to calculate a frequency dependent gain factor based on the difference between the frequency domain spectra of the stereo channels.

    Abstract translation: 用于去除或放大平移到立体声记录中心的语音或其他信号的方法和装置利用频域技术来基于立体声通道的频域频谱之间的差来计算频率相关增益因子。

    Method and apparatus for a tunable high-resolution spectral estimator
    4.
    发明授权
    Method and apparatus for a tunable high-resolution spectral estimator 失效
    用于可调谐高分辨率频谱估计器的方法和装置

    公开(公告)号:US06400310B1

    公开(公告)日:2002-06-04

    申请号:US09176984

    申请日:1998-10-22

    CPC classification number: G10L25/48 G10L19/06 G10L25/12

    Abstract: A high resolution spectral estimator (HREE) filter coupled to a spectral plotter processes either Doppler frequencies provided from the output of a pulse-Doppler radar or a frequency based output provided by a Fourier transformer coupled to a sensing device to allow the spectral plotter to determine the power frequency spectrum of either the pulse-Doppler radar output or sensing device output. The HREE filter preferably comprises a bank of first order filters tuned to a pre-selected frequency, a covariance estimator coupled to the filter bank for estimating filter covariances, and a decoder coupled to the covariance estimator for producing a plurality of filter parameters. Further, it is preferable that the filters comprising the filter bank be adjustable to permit their being tuned to a desired frequency based on a priori information.

    Abstract translation: 耦合到频谱绘图仪的高分辨率频谱估计器(HREE)滤波器处理从脉冲多普勒雷达的输出提供的多普勒频率或由耦合到感测装置的傅立叶变换器提供的基于频率的输出,以允许光谱绘图仪确定 脉冲多普勒雷达输出或感测装置输出的功率频谱。 HREE滤波器优选地包括调谐到预选频率的一阶滤波器组,耦合到滤波器组的协方差估计器,用于估计滤波器协方差,以及耦合到协方差估计器的解码器,用于产生多个滤波器参数。 此外,优选地,包括滤波器组的滤波器是可调节的,以允许它们基于先验信息被调谐到期望的频率。

    System, method and article of manufacture for an emotion detection system improving emotion recognition
    5.
    发明授权
    System, method and article of manufacture for an emotion detection system improving emotion recognition 有权
    用于比较用户与语音信号的计算机情感检测的情感检测

    公开(公告)号:US06353810B1

    公开(公告)日:2002-03-05

    申请号:US09387037

    申请日:1999-08-31

    CPC classification number: G10L17/26

    Abstract: A voice signal and an emotion associated therewith is provided. Then, the emotion associated with the voice signal is determined in an automated manner and subsequently stored. Next, a user determined emotion associated with the voice signal is determined by a user and received. The automatically determined emotion with the user determined emotion are then compared.

    Abstract translation: 提供语音信号和与之相关的情绪。 然后,以自动方式确定与语音信号相关联的情绪并随后存储。 接下来,用户确定与语音信号相关联的情绪由用户确定并被接收。 然后比较用户确定的情绪自动确定的情感。

    User barge-in enablement in large vocabulary speech recognition systems

    公开(公告)号:US06246986B1

    公开(公告)日:2001-06-12

    申请号:US09223945

    申请日:1998-12-31

    CPC classification number: G10L15/22 G10L2015/088

    Abstract: An interactive voice response unit which provides beneficial operation by including means to handle unconstrained input such as natural speech and to allow barge-in includes a prompter, a recognizer of speech signals, a meaningful phrase detector and classifier, and a turn-taking module, all under control of a dialog manager. In the course of listening to user input while outputting a voiced message, the voice response unit processes the received signal and ascertains whether it is receiving an utterance that is intended to interrupt the prompt, or merely noise or an utterance that is not meant to be used by the arrangement. The unit is sensitive to the speed and context of the speech provided by the user and is thus able to distinguish between a situation where a speaker is merely pausing and a situation where a speaker is done speaking.

    Method and apparatus for controlling multiple speech engines in an in-vehicle speech recognition system
    7.
    发明授权
    Method and apparatus for controlling multiple speech engines in an in-vehicle speech recognition system 失效
    用于控制车载语音识别系统中的多个语音引擎的方法和装置

    公开(公告)号:US06230138B1

    公开(公告)日:2001-05-08

    申请号:US09605253

    申请日:2000-06-28

    CPC classification number: G10L15/26 G10L15/22 G10L2015/223 G10L2015/228

    Abstract: Disclosed herein is a method and apparatus for controlling a speech recognition system on board an automobile. The automobile has one or more voice activated accessories and a passenger cabin with a number of seating locations. The speech recognition system has a plurality of microphones and push-to-talk controls corresponding to the seating locations for inputting speech commands and location identifying signals, respectively. The speech recognition system also includes multiple speech engines recognizing speech commands for operating the voice activated accessories. A selector is coupled to the speech engines and push-to-talk controls for selecting the speech engine best suited for the current speaking location. A speech processor coupled to the speech engine selector is used to recognize the speech commands and transmit the commands to the voice activated accessory.

    Abstract translation: 这里公开了一种用于控制汽车上的语音识别系统的方法和装置。 汽车有一个或多个声音启动配件和一个乘客座位,有多个座位。 语音识别系统具有分别对应于用于输入语音命令和位置识别信号的就座位置的多个麦克风和一键通控制。 语音识别系统还包括识别用于操作语音激活附件的语音命令的多个语音引擎。 选择器耦合到语音引擎和一键通控制,用于选择最适合当前说话位置的语音引擎。 耦合到语音引擎选择器的语音处理器用于识别语音命令并将命令发送到语音激活的附件。

    Modeling and projecting emotion and personality from a computer user interface
    8.
    发明授权
    Modeling and projecting emotion and personality from a computer user interface 失效
    从计算机用户界面建模和投射情感和个性

    公开(公告)号:US06212502B1

    公开(公告)日:2001-04-03

    申请号:US09109232

    申请日:1998-06-30

    CPC classification number: H04N21/466 G10L17/26 H04N21/4663

    Abstract: The invention is embodied in a computer user interface including an observer capable of observing user behavior, an agent capable of conveying emotion and personality by exhibiting corresponding behavior to a user, and a network linking user behavior observed by said observer and emotion and personality conveyed by said agent. The network can include an observing network facilitating inferencing user emotional and personality states from the behavior observed by the observer as well as an agent network facilitating inferencing of agent behavior from emotion and personality states to be conveyed by the agent. In addition, a policy module can dictate to the agent network desired emotion and personality states to be conveyed by the agent based upon user emotion and personality states inferred by the observing network. Typically, each network is a stochastic model. Each stochastic model is preferably a Bayesian network, so that the observing network is a first Bayesian network while the agent network is a second Bayesian network. Generally, the first and second Bayesian networks are similar copies of one another. Each of the two Bayesian networks include a first layer of multi-state nodes representing respective emotional and personality variables, and a second layer of multi-state nodes representing respective behavioral variables. Each one of the nodes includes probabilities linking each state in the one node with states of others of the nodes. More specifically, each one of the nodes in the first layer includes probabilities linking the states of the one first layer node to the states of nodes in the second layer. Similarly, each one of the nodes in the second layer include probabilities linking the states of the one second layer node to states of nodes in the first layer.

    Abstract translation: 本发明体现在包括能够观察用户行为的观察者的计算机用户界面中,能够通过向用户展示相应行为而传达情感和个性的代理以及链接由所述观察者观察到的用户行为的网络以及由 代理人 该网络可以包括一个观察网络,便于从观察者观察到的行为推断用户情绪和个性状态,以及代理网络,便于将代理人行为从情绪和人格状态推断以由代理人传达。 此外,策略模块可以根据由观察网络推断出的用户情感和个性状态来指示代理网络期望的情感和个性状态由代理传达。 通常,每个网络都是随机模型。 每个随机模型优选地是贝叶斯网络,使得观察网络是第一个贝叶斯网络,而代理网络是第二个贝叶斯网络。 通常,第一和第二贝叶斯网络是相似的副本。 两个贝叶斯网络中的每一个包括表示各自的情绪和个性变量的第一层多状态节点,以及表示相应的行为变量的第二层多状态节点。 每个节点包括将一个节点中的每个状态与其他节点的状态相关联的概率。 更具体地,第一层中的每个节点包括将一个第一层节点的状态与第二层中的节点的状态链接的概率。 类似地,第二层中的每个节点包括链接一个第二层节点的状态与第一层中的节点状态的概率。

    Method and system for providing network based transcription services
    9.
    发明授权
    Method and system for providing network based transcription services 失效
    提供基于网络的转录服务的方法和系统

    公开(公告)号:US06175822B1

    公开(公告)日:2001-01-16

    申请号:US09093011

    申请日:1998-06-05

    Inventor: Bryce Alan Jones

    Abstract: A method and system of providing network based transcription of free-form speech signals. A speech signal is recorded as a digital audio file in a storage medium and is then streamed over a data network to a client terminal for transcription. As the speech signal arrives, it is buffered in memory at the client terminal while a streaming player application plays the signal to a transcriptionist. The transcriptionist then conveniently listens to and transcribes the speech signal as it is being played. The invention advantageously avoids the need to physically transfer and download the full digital audio to a transcriptionist computer or to transport physical storage media, such as tapes or CD-ROM from the place of recording to the place where the recorded voice signals will be transcribed.

    Abstract translation: 提供自由形式语音信号的基于网络的转录的方法和系统。 将语音信号作为数字音频文件记录在存储介质中,然后通过数据网络流式传输到客户终端进行转录。 当语音信号到达时,它被缓存在客户终端的存储器中,而流播放器应用将信号发送给记录者。 然后,录音员在播放语音信号时方便地收听并转录。 本发明有利地避免了将完整数字音频物理传送和下载到录音机计算机或将物理存储介质(例如磁带或CD-ROM)从记录位置传送到记录的语音信号将被转录的地方的需要。

    Controlled access to audio signals based on objectionable audio content detected via sound recognition
    10.
    发明授权
    Controlled access to audio signals based on objectionable audio content detected via sound recognition 有权
    基于通过声音识别检测到的令人反感的音频内容控制对音频信号的访问

    公开(公告)号:US06829582B1

    公开(公告)日:2004-12-07

    申请号:US09685784

    申请日:2000-10-10

    CPC classification number: H04N21/4394 H04N21/4542

    Abstract: An apparatus, program product, and method restrict access to objectionable audio content in an audio or audio/video transmission using sound recognition. Sound recognition may be performed, for example, to detect and control access to objectionable non-spoken audio content, e.g., by detecting violent sounds such as screams, explosions, gun shots, sirens, punches, kicks and/or other non-spoken content such as sexually-suggestive sounds. In addition, occurrences of objectionable audio content detected in an audio transmission may be tracked so that access to the audio transmission may be controlled responsive to the identification of multiple occurrences of objectionable audio content. Furthermore, access control over detected objectionable audio content in an audio transmission may result in inhibition of access to a program associated with the audio transmission.

    Abstract translation: 一种装置,程序产品和方法使用声音识别来限制对音频或音频/视频传输中令人反感的音频内容的访问。 可以执行声音识别,例如,以检测和控制对令人反感的非口头音频内容的访问,例如通过检测诸如尖叫声,爆炸声,枪声,警笛声,冲击声,踢声和/或其他非口头内容的暴力声音 如性暗示的声音。 此外,可以跟踪在音频传输中检测到的令人讨厌的音频内容的发生,使得可以响应于多次出现令人反感的音频内容的识别来控制对音频传输的访问。 此外,在音频传输中检测到的令人反感的音频内容的访问控制可能导致禁止对与音频传输相关联的节目的访问。

Patent Agency Ranking