System and method for personalization of acoustic models for automatic speech recognition
    2.
    发明授权
    System and method for personalization of acoustic models for automatic speech recognition 有权
    用于自动语音识别的声学模型个性化的系统和方法

    公开(公告)号:US09026444B2

    公开(公告)日:2015-05-05

    申请号:US12561005

    申请日:2009-09-16

    IPC分类号: G10L15/22 G10L15/07 G10L15/06

    摘要: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

    摘要翻译: 这里公开了用于自动语音识别的方法,系统和计算机可读存储介质。 该方法包括选择一个说话者独立模型,并选择一个说话者依赖模型的数量,说话人依赖模型的数量是基于可用的计算资源,所选择的模型包括与说话者无关的模型和说话者依赖模型的数量。 该方法还包括使用所选择的模型中的每一个并行地识别话语,并且基于使用所选择的模型的组的识别精度从所选择的模型中选择主要语言模型。 该系统包括处理器和被配置为控制处理器执行该方法的模块。 计算机可读存储介质包括用于使计算设备执行该方法的步骤的指令。

    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model
    3.
    发明授权
    System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model 有权
    通过修改声学,语言和语义模型,由于错误的ASR输出来处理重复查询的系统和方法

    公开(公告)号:US08990085B2

    公开(公告)日:2015-03-24

    申请号:US12570757

    申请日:2009-09-30

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.

    摘要翻译: 本文公开了用于处理预期重复语音查询或其他输入的系统,计算机实现的方法和计算机可读存储介质。 该方法使得计算设备检测来自用户的误识别语音查询,确定用户基于先前用户交互重复语音查询的趋势,以及基于在预期重复语音查询之前确定的趋势来调整语音识别模型。 该方法还可以包括基于适应的语音识别模型识别来自用户的预期重复语音查询。 适应语音识别模型可以包括修改声学模型,语言模型和语义模型。 适应语音识别模型还可以包括基于使用历史和识别格中的条目为预期重复查询准备个性化搜索语音识别模型。 该方法可以包括使用适应的语音识别模型保留未修改的语音识别模型。

    System and method for standardized speech recognition infrastructure

    公开(公告)号:US08374867B2

    公开(公告)日:2013-02-12

    申请号:US12618371

    申请日:2009-11-13

    IPC分类号: G10L15/06

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

    System and method for supplemental speech recognition by identified idle resources
    6.
    发明授权
    System and method for supplemental speech recognition by identified idle resources 有权
    通过识别的闲置资源补充语音识别的系统和方法

    公开(公告)号:US08346549B2

    公开(公告)日:2013-01-01

    申请号:US12631131

    申请日:2009-12-04

    IPC分类号: G10L15/00

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.

    摘要翻译: 本文公开了用于改善自动语音识别性能的系统,方法和计算机可读存储介质。 实施该方法的系统识别空闲语音识别资源,并且基于总体语音识别需求在空闲资源上建立补充语音识别器。 补充语音识别器可以与主语音识别器不同,并且与主语音识别器一起可以与特定扬声器相关联。 该系统与主语音识别器和辅助语音识别器并行地执行从特定扬声器接收的语音的语音识别,并且组合来自主语音识别器和补充语音识别器的结果。 系统基于组合的结果识别接收到的语音。 该系统可以使用波束调整来代替或与补充语音识别器组合。 调度算法可以定制语音识别资源的特定组合,并且基于增加的需求来释放补充语音识别器。

    Systems and methods of providing modified media content
    7.
    发明授权
    Systems and methods of providing modified media content 有权
    提供修改的媒体内容的系统和方法

    公开(公告)号:US08312492B2

    公开(公告)日:2012-11-13

    申请号:US11725591

    申请日:2007-03-19

    IPC分类号: H04N7/173 G06F15/00 G10L11/00

    摘要: A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.

    摘要翻译: 公开了提供媒体内容的方法和系统。 在特定实施例中,该方法包括在机顶盒设备处从内容源接收媒体内容。 媒体内容包括具有第一播放速率的视频数据和具有第一播放速率的音频数据。 该方法还包括经由非线性变换来变换音频数据以产生具有第二播放速率的修改的音频数据,修改视频数据以产生具有第二播放速率的修改的视频数据,以及使修改的音频数据和修改的视频同步 数据以产生具有第二播放速率的修改的媒体内容。 还公开了一种基于网络的媒体内容存储设备和相关逻辑以提供经调整的速率音频内容。

    System and method of word lattice augmentation using a pre/post vocalic consonant distinction
    8.
    发明授权
    System and method of word lattice augmentation using a pre/post vocalic consonant distinction 有权
    使用前/后声乐辅音区分的词格增强的系统和方法

    公开(公告)号:US08024191B2

    公开(公告)日:2011-09-20

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

    摘要翻译: 提供了系统和方法来识别语音对话系统中的语音。 该方法包括接收具有声前辅音或声后辅音的输入语音,通过将输入的语音与训练模型进行比较来产生至少一个输出格数,该输出格式通过比较输入语音来提供结果并区分前语音 辅音和语音后辅音。 通过测量输入语音中的声前辅音或声音后辅音与第一分数之间的相似度来计算第二分。 通过使用第二分数来确定至少一个类别,用于通过使用第二分数进行语前匹配或不匹配或者后声匹配或不匹配,并且通过使用至少一个类别对自动语音识别(ASR)系统的结果进行改进, 前声匹配或不匹配或后声匹配或不匹配。

    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS
    9.
    发明申请
    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS 有权
    用于SPOKEN对话系统的多状态边界模型

    公开(公告)号:US20090112599A1

    公开(公告)日:2009-04-30

    申请号:US11930619

    申请日:2007-10-31

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    IPC分类号: G10L11/00 G10L15/00

    摘要: Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    摘要翻译: 公开了用于在口语对话系统中应用多状态插入声学模型的系统,方法和计算机可读介质,包括以下步骤:(1)从口头对话系统向用户呈现提示。 (2)在呈现提示期间接收来自用户的音频语音输入,(3)累积从用户输入的音频语音,(4)应用具有至少两个一状态隐马尔可夫模型的非语音分量 HMM)到从用户输入的音频语音,(5)将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三态HMM中的每一个表示不同的语音类别 ,(6)确定音频语音输入是否是来自用户的输入语音输入,以及(7)如果音频语音输入被确定为来自用户的语音输入输入,则终止呈现 提示。

    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION
    10.
    发明申请
    SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION 有权
    使用前任/后期职业协商决定的字幕扩展的系统和方法

    公开(公告)号:US20090112591A1

    公开(公告)日:2009-04-30

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/00

    CPC分类号: G10L25/78 G10L15/02

    摘要: Disclosed are systems and methods for recognizing speech in a spoken dialogue system. The method includes (1) receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant, (2) generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result; (3) distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech, (4) calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score, (5) determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score, and (6) refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

    摘要翻译: 公开了用于在口头对话系统中识别语音的系统和方法。 该方法包括(1)接收具有至少一个声前辅音或至少一个声后辅音的输入语音,(2)通过将输入的语音与训练模型进行比较来产生计算第一分数的至少一个输出格 提供结果; (3)在所述输入语音中区分所述至少一个声前辅音和所述至少一个声后辅音,(4)通过测量所述至少一个声前辅音或所述至少一个声前辅音之间的相似度来计算第二分数 输入语音和第一分数中的至少一个声音辅音,(5)通过使用第二分数来确定至少一个人声前匹配或不匹配或至少一个后声匹配或不匹配的至少一个类别,以及( 6)通过使用至少一个类别进行至少一个声前匹配或不匹配或至少一个后声匹配或不匹配,来改进自动语音识别(ASR)系统的结果。