System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling
    1.
    发明授权
    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling 有权
    通过改进发音建模来增加词汇单词识别率的系统和方法

    公开(公告)号:US08892441B2

    公开(公告)日:2014-11-18

    申请号:US13311512

    申请日:2011-12-05

    IPC分类号: G10L15/187 G10L15/06

    摘要: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes overgenerating potential pronunciations based on symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    摘要翻译: 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 该方法包括基于符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH
    2.
    发明申请
    SYSTEM AND METHOD FOR DISCRIMINATIVE PRONUNCIATION MODELING FOR VOICE SEARCH 有权
    用于语音搜索的分辨率发音建模的系统和方法

    公开(公告)号:US20100125457A1

    公开(公告)日:2010-05-20

    申请号:US12274025

    申请日:2008-11-19

    IPC分类号: G10L15/04

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations.

    摘要翻译: 本文公开了用于语音识别的系统,计算机实现的方法和计算机可读介质。 该方法包括接收语音话语,在语音话语中为每个语音单元分配发音权重,将每个相应的发音权重以语音级别为单位归一化为1,对于每个接收到的语音话语,通过( 1)识别词和电话对齐和相应的可能性分数,以及(2)歧视地调整发音权重以最小化分类错误,以及使用优化的发音权重来识别附加的接收到的语音话语。 语音单位可以是句子,单词,上下文相关的电话,与上下文无关的电话或音节。 该方法还可以包括基于目标函数的歧视地适应发音权重。 目标函数可以是本领域技术人员已知的最大相互信息(MMI),最大似然(MLE)训练,最小分类误差(MCE)训练或其他功能。 言语言可以是名字。 可以作为多模态搜索或输入的一部分接收演讲话语。 歧视性地适应发音权重的步骤还可以包括随机建模发音。

    System and method for discriminative pronunciation modeling for voice search
    3.
    发明授权
    System and method for discriminative pronunciation modeling for voice search 有权
    用于语音搜索的歧视性发音建模的系统和方法

    公开(公告)号:US08296141B2

    公开(公告)日:2012-10-23

    申请号:US12274025

    申请日:2008-11-19

    IPC分类号: G10L15/04

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable media for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by (1) identifying word and phone alignments and corresponding likelihood scores, and (2) discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information (MMI), maximum likelihood (MLE) training, minimum classification error (MCE) training, or other functions known to those of skill in the art. Speech utterances can be names. The speech utterances can be received as part of a multimodal search or input. The step of discriminatively adapting pronunciation weights can further include stochastically modeling pronunciations.

    摘要翻译: 本文公开了用于语音识别的系统,计算机实现的方法和计算机可读介质。 该方法包括接收语音话语,在语音话语中为每个语音单元分配发音权重,将每个相应的发音权重以语音级别为单位归一化为1,对于每个接收到的语音话语,通过( 1)识别词和电话对齐和相应的可能性分数,以及(2)歧视地调整发音权重以最小化分类错误,以及使用优化的发音权重来识别附加的接收到的语音话语。 语音单位可以是句子,单词,上下文相关的电话,与上下文无关的电话或音节。 该方法还可以包括基于目标函数的歧视地适应发音权重。 目标函数可以是本领域技术人员已知的最大相互信息(MMI),最大似然(MLE)训练,最小分类误差(MCE)训练或其他功能。 言语言可以是名字。 可以作为多模态搜索或输入的一部分接收演讲话语。 歧视性地适应发音权重的步骤还可以包括随机建模发音。

    System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling
    4.
    发明申请
    System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling 有权
    通过改进发音建模来提高词汇量识别率的系统和方法

    公开(公告)号:US20120078617A1

    公开(公告)日:2012-03-29

    申请号:US13311512

    申请日:2011-12-05

    IPC分类号: G06F17/21

    摘要: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    摘要翻译: 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 所述方法包括:将符号输入作为标记的语音数据接收,基于所述符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将所识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling
    5.
    发明授权
    System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling 有权
    通过改进发音建模来增加词汇单词识别率的系统和方法

    公开(公告)号:US08095365B2

    公开(公告)日:2012-01-10

    申请号:US12328436

    申请日:2008-12-04

    IPC分类号: G10L13/08

    摘要: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    摘要翻译: 本公开涉及用于生成用于语音识别的词典的系统,方法和计算机可读介质。 所述方法包括:将符号输入作为标记的语音数据接收,基于所述符号输入过度生成潜在发音,识别语音识别语境中的潜在发音,以及将所识别的潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    SYSTEM AND METHOD FOR INCREASING RECOGNITION RATES OF IN-VOCABULARY WORDS BY IMPROVING PRONUNCIATION MODELING
    6.
    发明申请
    SYSTEM AND METHOD FOR INCREASING RECOGNITION RATES OF IN-VOCABULARY WORDS BY IMPROVING PRONUNCIATION MODELING 有权
    通过改进发明建模来提高语义词识别率的系统和方法

    公开(公告)号:US20100145704A1

    公开(公告)日:2010-06-10

    申请号:US12328436

    申请日:2008-12-04

    IPC分类号: G10L13/08

    摘要: Disclosed herein are systems, methods, and computer readable-media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying best potential pronunciations in a speech recognition context, and storing the identified best potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

    摘要翻译: 本文公开了用于生成用于语音识别的词典的系统,方法和计算机可读介质。 该方法包括接收符号输入作为标记的语音数据,基于符号输入过度生成潜在发音,识别语音识别语境中的最佳潜在发音,以及将识别的最佳潜在发音存储在词典中。 过度生成潜在发音可以包括为短的字母序列建立一组转换规则,基于一组转换规则将符号输入的部分转换成许多可能的词汇发音变体,对可能的词汇发音变体在加权 网络和音素列表,并且基于改进的发音迭代地重新训练一组转换规则。 符号输入可以包括相同口语单词的多个示例。 语音数据可以被明确地或隐含地标记,并且可以将单词包括为文本和记录的音频。

    System and method for supplemental speech recognition by identified idle resources
    7.
    发明授权
    System and method for supplemental speech recognition by identified idle resources 有权
    通过识别的闲置资源补充语音识别的系统和方法

    公开(公告)号:US08346549B2

    公开(公告)日:2013-01-01

    申请号:US12631131

    申请日:2009-12-04

    IPC分类号: G10L15/00

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.

    摘要翻译: 本文公开了用于改善自动语音识别性能的系统,方法和计算机可读存储介质。 实施该方法的系统识别空闲语音识别资源,并且基于总体语音识别需求在空闲资源上建立补充语音识别器。 补充语音识别器可以与主语音识别器不同,并且与主语音识别器一起可以与特定扬声器相关联。 该系统与主语音识别器和辅助语音识别器并行地执行从特定扬声器接收的语音的语音识别,并且组合来自主语音识别器和补充语音识别器的结果。 系统基于组合的结果识别接收到的语音。 该系统可以使用波束调整来代替或与补充语音识别器组合。 调度算法可以定制语音识别资源的特定组合,并且基于增加的需求来释放补充语音识别器。

    Adapting language models with a bit mask for a subset of related words
    8.
    发明授权
    Adapting language models with a bit mask for a subset of related words 有权
    使用相关字词子集的位掩码来适应语言模型

    公开(公告)号:US08589163B2

    公开(公告)日:2013-11-19

    申请号:US12631111

    申请日:2009-12-04

    IPC分类号: G10L15/06 G10L15/12 G10L15/22

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.

    摘要翻译: 本文公开了用于基于掩蔽语言模型执行语音识别的系统,方法和计算机可读存储介质。 被配置为实施该方法的系统接收包括多个单词的掩蔽语言模型,其中位掩码识别关于自适应子集是否允许或不允许多个单词中的每一个,接收输入语音,生成语音识别格 基于使用掩蔽语言模型的接收到的输入语音,从由适配子集的位掩码识别为不允许的生成的格子字中移除,并且基于格子识别接收的语音。 或者在生成步骤期间,系统只能添加由位掩码允许的指示的字。 位掩码可以与掩蔽语言模型的一部分分开或并入。 系统可以动态地更新自适应子集和位掩码。

    SYSTEM AND METHOD FOR RESTRICTING LARGE LANGUAGE MODELS
    9.
    发明申请
    SYSTEM AND METHOD FOR RESTRICTING LARGE LANGUAGE MODELS 有权
    限制大型语言模型的系统和方法

    公开(公告)号:US20110137653A1

    公开(公告)日:2011-06-09

    申请号:US12631111

    申请日:2009-12-04

    IPC分类号: G10L15/00

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.

    摘要翻译: 本文公开了用于基于掩蔽语言模型执行语音识别的系统,方法和计算机可读存储介质。 被配置为实施该方法的系统接收包括多个单词的掩蔽语言模型,其中位掩码识别关于自适应子集是否允许或不允许多个单词中的每一个,接收输入语音,生成语音识别格 基于使用掩蔽语言模型的接收到的输入语音,从由适配子集的位掩码识别为不允许的生成的格子字中移除,并且基于格子识别接收的语音。 或者在生成步骤期间,系统只能添加由位掩码允许的指示的字。 位掩码可以与掩蔽语言模型的一部分分开或并入。 系统可以动态地更新自适应子集和位掩码。