Low latency real-time vocal tract length normalization
    1.
    发明授权
    Low latency real-time vocal tract length normalization 有权
    低延迟实时声道长度归一化

    公开(公告)号:US08909527B2

    公开(公告)日:2014-12-09

    申请号:US12490634

    申请日:2009-06-24

    IPC分类号: G10L15/12 G10L15/06 G10L15/02

    摘要: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

    摘要翻译: 提供了一种用于训练自动语音识别系统的方法和系统。 该方法包括将训练数据分离成说话者特定的分段,并且对于每个说话者的特定分段,执行以下动作:产生频谱数据,选择第一变形因子和扭曲频谱数据,以及将变形的频谱数据与语音模型进行比较。 该方法还包括迭代地执行选择另一个翘曲因子并产生另一个弯曲光谱数据,将其他翘曲光谱数据与语音模型进行比较的步骤,并且如果另一个翘曲因子产生与语音模型更接近的匹配,则节省另一个翘曲 因素是演讲者特定细分的最佳翘曲因素。 该系统包括被配置为控制系统中的处理器以执行该方法的步骤的模块。

    LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION
    2.
    发明申请
    LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION 有权
    低实时视角追踪长度正常化

    公开(公告)号:US20090259465A1

    公开(公告)日:2009-10-15

    申请号:US12490634

    申请日:2009-06-24

    IPC分类号: G10L15/02

    摘要: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

    摘要翻译: 提供了一种用于训练自动语音识别系统的方法和系统。 该方法包括将训练数据分离成说话者特定的分段,并且对于每个说话者的特定分段,执行以下动作:产生频谱数据,选择第一变形因子和扭曲频谱数据,以及将变形的频谱数据与语音模型进行比较。 该方法还包括迭代地执行选择另一个翘曲因子并产生另一个弯曲光谱数据,将其他翘曲光谱数据与语音模型进行比较的步骤,并且如果另一个翘曲因子产生与语音模型更接近的匹配,则节省另一个翘曲 因素是演讲者特定细分的最佳翘曲因素。 该系统包括被配置为控制系统中的处理器以执行该方法的步骤的模块。

    Low latency real-time vocal tract length normalization
    3.
    发明授权
    Low latency real-time vocal tract length normalization 有权
    低延迟实时声道长度归一化

    公开(公告)号:US07567903B1

    公开(公告)日:2009-07-28

    申请号:US11034535

    申请日:2005-01-12

    摘要: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

    摘要翻译: 提供了一种用于执行语音识别的方法和装置。 声音段长度从训练数据生成扬声器的归一化声学模型。 在第一识别输入上执行语音识别以确定第一最佳假设。 第一个声带长度归一化因子是基于第一个最佳假设估计的。 在第二识别输入上使用声带长度归一化声学模型进行语音识别,以确定另一个最佳假设。 另一个声带长度归一化因子基于另一个最佳假设和至少一个先前的最佳假设来估计。

    System and method for standardized speech recognition infrastructure

    公开(公告)号:US08374867B2

    公开(公告)日:2013-02-12

    申请号:US12618371

    申请日:2009-11-13

    IPC分类号: G10L15/06

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

    System and method for supplemental speech recognition by identified idle resources
    7.
    发明授权
    System and method for supplemental speech recognition by identified idle resources 有权
    通过识别的闲置资源补充语音识别的系统和方法

    公开(公告)号:US08346549B2

    公开(公告)日:2013-01-01

    申请号:US12631131

    申请日:2009-12-04

    IPC分类号: G10L15/00

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.

    摘要翻译: 本文公开了用于改善自动语音识别性能的系统,方法和计算机可读存储介质。 实施该方法的系统识别空闲语音识别资源,并且基于总体语音识别需求在空闲资源上建立补充语音识别器。 补充语音识别器可以与主语音识别器不同,并且与主语音识别器一起可以与特定扬声器相关联。 该系统与主语音识别器和辅助语音识别器并行地执行从特定扬声器接收的语音的语音识别,并且组合来自主语音识别器和补充语音识别器的结果。 系统基于组合的结果识别接收到的语音。 该系统可以使用波束调整来代替或与补充语音识别器组合。 调度算法可以定制语音识别资源的特定组合,并且基于增加的需求来释放补充语音识别器。

    Systems and methods of providing modified media content
    8.
    发明授权
    Systems and methods of providing modified media content 有权
    提供修改的媒体内容的系统和方法

    公开(公告)号:US08312492B2

    公开(公告)日:2012-11-13

    申请号:US11725591

    申请日:2007-03-19

    IPC分类号: H04N7/173 G06F15/00 G10L11/00

    摘要: A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.

    摘要翻译: 公开了提供媒体内容的方法和系统。 在特定实施例中,该方法包括在机顶盒设备处从内容源接收媒体内容。 媒体内容包括具有第一播放速率的视频数据和具有第一播放速率的音频数据。 该方法还包括经由非线性变换来变换音频数据以产生具有第二播放速率的修改的音频数据,修改视频数据以产生具有第二播放速率的修改的视频数据,以及使修改的音频数据和修改的视频同步 数据以产生具有第二播放速率的修改的媒体内容。 还公开了一种基于网络的媒体内容存储设备和相关逻辑以提供经调整的速率音频内容。

    System and method of word lattice augmentation using a pre/post vocalic consonant distinction
    9.
    发明授权
    System and method of word lattice augmentation using a pre/post vocalic consonant distinction 有权
    使用前/后声乐辅音区分的词格增强的系统和方法

    公开(公告)号:US08024191B2

    公开(公告)日:2011-09-20

    申请号:US11930999

    申请日:2007-10-31

    IPC分类号: G10L15/04

    CPC分类号: G10L25/78 G10L15/02

    摘要: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

    摘要翻译: 提供了系统和方法来识别语音对话系统中的语音。 该方法包括接收具有声前辅音或声后辅音的输入语音,通过将输入的语音与训练模型进行比较来产生至少一个输出格数,该输出格式通过比较输入语音来提供结果并区分前语音 辅音和语音后辅音。 通过测量输入语音中的声前辅音或声音后辅音与第一分数之间的相似度来计算第二分。 通过使用第二分数来确定至少一个类别,用于通过使用第二分数进行语前匹配或不匹配或者后声匹配或不匹配,并且通过使用至少一个类别对自动语音识别(ASR)系统的结果进行改进, 前声匹配或不匹配或后声匹配或不匹配。

    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS
    10.
    发明申请
    MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS 有权
    用于SPOKEN对话系统的多状态边界模型

    公开(公告)号:US20090112599A1

    公开(公告)日:2009-04-30

    申请号:US11930619

    申请日:2007-10-31

    申请人: Andrej Ljolje

    发明人: Andrej Ljolje

    IPC分类号: G10L11/00 G10L15/00

    摘要: Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

    摘要翻译: 公开了用于在口语对话系统中应用多状态插入声学模型的系统,方法和计算机可读介质,包括以下步骤:(1)从口头对话系统向用户呈现提示。 (2)在呈现提示期间接收来自用户的音频语音输入,(3)累积从用户输入的音频语音,(4)应用具有至少两个一状态隐马尔可夫模型的非语音分量 HMM)到从用户输入的音频语音,(5)将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音,其中五个三态HMM中的每一个表示不同的语音类别 ,(6)确定音频语音输入是否是来自用户的输入语音输入,以及(7)如果音频语音输入被确定为来自用户的语音输入输入,则终止呈现 提示。