Text-to-speech using clustered context-dependent phoneme-based units
    1.
    发明授权
    Text-to-speech using clustered context-dependent phoneme-based units 失效
    使用基于上下文的基于音素的单元的文本到语音

    公开(公告)号:US6163769A

    公开(公告)日:2000-12-19

    申请号:US949138

    申请日:1997-10-02

    IPC分类号: G10L13/06 G10L13/00

    CPC分类号: G10L13/07

    摘要: A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.

    摘要翻译: 文本到语音系统包括用于存储目标说话者的基于上下文的基于音素的单元的聚集集合的存储设备。 在一个实施例中,使用决策树,其中基于上下文的基于音素的单元的每个基于决策树的单元基于至少一个紧接在前和后面的音素的上下文来排列。 基于上下文的基于音素的单元中的至少一个单元表示由于类似的上下文而具有类似声音的其他未存储的上下文相关音素单元。 文本分析器获得代表要转换为语音的文本的语音符号串。 级联模块基于语音符号的上下文从基于上下文的基于音素的单元中选择存储的基于决策树的基于上下文的基于音素的基于单元的基于上下文的基于音素的单元,并且合成所选择的基于音素的单元以产生对应于 文本。

    Method and system for dynamically adjusted training for speech
recognition
    2.
    发明授权
    Method and system for dynamically adjusted training for speech recognition 失效
    用于语音识别的动态调整训练的方法和系统

    公开(公告)号:US5963903A

    公开(公告)日:1999-10-05

    申请号:US673435

    申请日:1996-06-28

    CPC分类号: G10L15/063 G10L2015/0635

    摘要: A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.

    摘要翻译: 一种用于动态选择用于训练语音识别系统的单词的方法和系统。 语音识别系统使用隐马尔科夫模型对每个音素进行建模,并将每个单词表示为音素序列。 训练系统根据将相应的码字作为音素的一部分被说出的概率,对每个帧的每个音素进行排序。 训练系统收集对应词语已知的口语说话。 然后,训练系统将每个话语的码字与被认为是其一部分的音素对齐。 训练系统然后使用对齐的帧的对齐码字来计算每个音素的平均等级。 最后,训练系统选择包含低等级音素的训练词。

    Method and system for correcting misrecognized spoken words or phrases
    3.
    发明授权
    Method and system for correcting misrecognized spoken words or phrases 失效
    用于纠正错误识别的口头单词或短语的方法和系统

    公开(公告)号:US5829000A

    公开(公告)日:1998-10-27

    申请号:US741696

    申请日:1996-10-31

    IPC分类号: G10L15/06 G10L15/22 G01L5/06

    CPC分类号: G10L15/22

    摘要: A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. This elimination occurs based on the probabilities of alternative words associated with both the misrecognized utterance and the respoken utterance. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor. The system also uses a word correction metaphor or a phrase correction metaphor.

    摘要翻译: 用于编辑错误识别的单词的方法和系统。 该系统允许扬声器通过调整校正窗口的大小来指定要在校正窗口中显示的替代单词的数量。 系统还会按字母顺序显示校正窗口中的单词。 一个首选的系统消除了当一个错误识别的话被重申时,这个可重复发音将被再次被认为是同一个错误识别的单词的可能性。 这种消除是基于与错误识别的话语和呼出话语相关联的替代词的概率。 当使用文字处理器进行操作时,该系统允许扬声器指定在传送到文字处理器之前缓冲的语音量。 该系统还使用单词修正隐喻或短语校正隐喻。

    Multi-sensory speech detection system
    4.
    发明授权
    Multi-sensory speech detection system 失效
    多感官语音检测系统

    公开(公告)号:US07383181B2

    公开(公告)日:2008-06-03

    申请号:US10629278

    申请日:2003-07-29

    IPC分类号: G10L15/00

    摘要: The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.

    摘要翻译: 本发明将常规音频麦克风与基于输入提供语音传感器信号的附加话音传感器组合。 语音传感器信号基于语音中的扬声器在诸如面部运动,骨骼振动,喉部振动,喉部阻抗变化等中的动作而产生。语音检测器组件从语音传感器接收输入并输出语音检测 指示用户是否正在说话的信号。 语音检测器基于麦克风信号和语音传感器信号产生语音检测信号。

    Method and apparatus for multi-sensory speech enhancement
    5.
    发明授权
    Method and apparatus for multi-sensory speech enhancement 有权
    多感官语音增强的方法和装置

    公开(公告)号:US07447630B2

    公开(公告)日:2008-11-04

    申请号:US10724008

    申请日:2003-11-26

    IPC分类号: G10L21/02

    摘要: A method and system use an alternative sensor signal received from a sensor other than an air conduction microphone to estimate a clean speech value. The estimation uses either the alternative sensor signal alone, or in conjunction with the air conduction microphone signal. The clean speech value is estimated without using a model trained from noisy training data collected from an air conduction microphone. Under one embodiment, correction vectors are added to a vector formed from the alternative sensor signal in order to form a filter, which is applied to the air conductive microphone signal to produce the clean speech estimate. In other embodiments, the pitch of a speech signal is determined from the alternative sensor signal and is used to decompose an air conduction microphone signal. The decomposed signal is then used to determine a clean signal estimate.

    摘要翻译: 一种方法和系统使用从除空气传导麦克风以外的传感器接收的替代传感器信号来估计干净的语音值。 该估计单独使用替代传感器信号,或者与导气麦克风信号一起使用。 无需使用从空气传导麦克风收集的噪声训练数据训练的模型来估计干净的语音值。 在一个实施例中,校正矢量被添加到由替代传感器信号形成的矢量中,以形成滤波器,该滤波器被施加到空气传导麦克风信号以产生干净的语音估计。 在其他实施例中,语音信号的音调由替代传感器信号确定,并用于分解空气传导麦克风信号。 然后使用分解的信号来确定干净的信号估计。

    Method and system of runtime acoustic unit selection for speech synthesis
    6.
    发明授权
    Method and system of runtime acoustic unit selection for speech synthesis 失效
    用于语音合成的运行时音单元选择的方法和系统

    公开(公告)号:US5913193A

    公开(公告)日:1999-06-15

    申请号:US648808

    申请日:1996-04-30

    CPC分类号: G10L13/07

    摘要: The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.

    摘要翻译: 本发明涉及一种产生更自然的声音语音的级联语音合成系统和方法。 该系统提供每个声学单元的多个实例,其可用于生成表示语言表达式的语音波形。 多个实例在合成过程的分析或训练阶段期间形成,并且被限制为最高概率实例的鲁棒表示。 提供多个实例使得合成器能够选择非常类似于期望实例的实例,从而消除了改变存储的实例以匹配所需实例的需要。 这实质上使相邻实例的边界之间的频谱失真最小化,从而产生更自然的声音语音。

    FORCE-FEEDBACK WITHIN TELEPRESENCE
    8.
    发明申请
    FORCE-FEEDBACK WITHIN TELEPRESENCE 有权
    电报中的反馈

    公开(公告)号:US20100306647A1

    公开(公告)日:2010-12-02

    申请号:US12472579

    申请日:2009-05-27

    IPC分类号: G06F3/01 G06F3/048

    CPC分类号: G06F3/016

    摘要: The claimed subject matter provides a system and/or a method that facilitates replicating a telepresence session with a real world physical meeting. A telepresence session can be initiated within a communication framework that includes two or more virtually represented users that communicate therein. A trigger component can monitor the telepresence session in real time to identify a participant interaction with an object, wherein the object is at least one of a real world physical object or a virtually represented object within the telepresence session. A feedback component can implement a force feedback to at least one participant within the telepresence session based upon the identified participant interaction with the object, wherein the force feedback is employed via a device associated with at least one participant.

    摘要翻译: 所要求保护的主题提供了一种有助于利用真实世界物理会议复制远程呈现会话的系统和/或方法。 可以在通信框架内启动远程呈现会话,该通信框架包括在其中通信的两个或更多虚拟表示的用户。 触发组件可以实时地监视远程呈现会话,以识别与对象的参与者交互,其中对象是远程呈现会话中的真实世界物理对象或虚拟表示对象中的至少一个。 基于所识别的参与者与对象的交互,反馈组件可以向远程呈现会话中的至少一个参与者实施强制反馈,其中通过与至少一个参与者相关联的设备来采用力反馈。

    Use of a unified language model
    10.
    发明授权
    Use of a unified language model 失效
    使用统一的语言模型

    公开(公告)号:US07013265B2

    公开(公告)日:2006-03-14

    申请号:US11003121

    申请日:2004-12-03

    IPC分类号: G06F17/27 G10L15/18 G10L11/00

    CPC分类号: G10L15/193 G10L15/197

    摘要: A language processing system includes a unified language model. The unified language model comprises a plurality of context-free grammars having non-terminal tokens representing semantic or syntactic concepts and terminals, and an N-gram language model having non-terminal tokens. A language processing module capable of receiving an input signal indicative of language accesses the unified language model to recognize the language. The language processing module generates hypotheses for the received language as a function of words of the unified language model and/or provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

    摘要翻译: 语言处理系统包括统一的语言模型。 统一语言模型包括具有表示语义或句法概念和终端的非终端令牌的多个无上下文语法,以及具有非终端令牌的N-gram语言模型。 能够接收指示语言的输入信号的语言处理模块访问统一语言模型以识别语言。 语言处理模块根据统一语言模型的单词生成接收到的语言的假设和/或提供指示语言的输出信号以及其中包含的至少一些语义或句法概念。