专利检索 ap:("Vincent Goffin" OR "Andrej Ljolje" OR "Murat Saraclar") AND inv:"Andrej Ljolje" 第 1 页

1.

发明授权
Low latency real-time vocal tract length normalization 有权
标题翻译：低延迟实时声道长度归一化

公开(公告)号：US08909527B2

公开(公告)日：2014-12-09

申请号：US12490634

申请日：2009-06-24

申请人： Vincent Goffin , Andrej Ljolje , Murat Saraclar

发明人： Vincent Goffin , Andrej Ljolje , Murat Saraclar

IPC分类号： G10L15/12 , G10L15/06 , G10L15/02

CPC分类号： G10L15/063 , G10L15/10 , G10L15/12 , G10L17/04 , G10L17/08

摘要： A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

摘要翻译： 提供了一种用于训练自动语音识别系统的方法和系统。该方法包括将训练数据分离成说话者特定的分段，并且对于每个说话者的特定分段，执行以下动作：产生频谱数据，选择第一变形因子和扭曲频谱数据，以及将变形的频谱数据与语音模型进行比较。该方法还包括迭代地执行选择另一个翘曲因子并产生另一个弯曲光谱数据，将其他翘曲光谱数据与语音模型进行比较的步骤，并且如果另一个翘曲因子产生与语音模型更接近的匹配，则节省另一个翘曲因素是演讲者特定细分的最佳翘曲因素。该系统包括被配置为控制系统中的处理器以执行该方法的步骤的模块。

2.

发明申请
LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION 有权
标题翻译：低实时视角追踪长度正常化

公开(公告)号：US20090259465A1

公开(公告)日：2009-10-15

申请号：US12490634

申请日：2009-06-24

申请人： Vincent Goffin , Andrej Ljolje , Murat Saraclar

发明人： Vincent Goffin , Andrej Ljolje , Murat Saraclar

IPC分类号： G10L15/02

CPC分类号： G10L15/063 , G10L15/10 , G10L15/12 , G10L17/04 , G10L17/08

摘要： A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

摘要翻译： 提供了一种用于训练自动语音识别系统的方法和系统。该方法包括将训练数据分离成说话者特定的分段，并且对于每个说话者的特定分段，执行以下动作：产生频谱数据，选择第一变形因子和扭曲频谱数据，以及将变形的频谱数据与语音模型进行比较。该方法还包括迭代地执行选择另一个翘曲因子并产生另一个弯曲光谱数据，将其他翘曲光谱数据与语音模型进行比较的步骤，并且如果另一个翘曲因子产生与语音模型更接近的匹配，则节省另一个翘曲因素是演讲者特定细分的最佳翘曲因素。该系统包括被配置为控制系统中的处理器以执行该方法的步骤的模块。

3.

发明授权
Low latency real-time vocal tract length normalization 有权
标题翻译：低延迟实时声道长度归一化

公开(公告)号：US07567903B1

公开(公告)日：2009-07-28

申请号：US11034535

申请日：2005-01-12

申请人： Vincent Goffin , Andrej Ljolje , Murat Saraclar

发明人： Vincent Goffin , Andrej Ljolje , Murat Saraclar

IPC分类号： G10L15/06 , G10L15/10 , G10L17/00 , G10L13/00 , G10L19/14

CPC分类号： G10L15/063 , G10L15/10 , G10L15/12 , G10L17/04 , G10L17/08

摘要： A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

摘要翻译： 提供了一种用于执行语音识别的方法和装置。声音段长度从训练数据生成扬声器的归一化声学模型。在第一识别输入上执行语音识别以确定第一最佳假设。第一个声带长度归一化因子是基于第一个最佳假设估计的。在第二识别输入上使用声带长度归一化声学模型进行语音识别，以确定另一个最佳假设。另一个声带长度归一化因子基于另一个最佳假设和至少一个先前的最佳假设来估计。

4.

发明授权
System and method for optimizing speech recognition and natural language parameters with user feedback 有权
标题翻译：用户反馈优化语音识别和自然语言参数的系统和方法

公开(公告)号：US08738375B2

公开(公告)日：2014-05-27

申请号：US13103665

申请日：2011-05-09

申请人： Andrej Ljolje , Diamantino Antonio Caseiro , Mazin Gilbert , Vincent Goffin , Taniya Mishra

发明人： Andrej Ljolje , Diamantino Antonio Caseiro , Mazin Gilbert , Vincent Goffin , Taniya Mishra

IPC分类号： G10L15/26

CPC分类号： G10L15/197 , G10L15/063 , G10L15/22

摘要： Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user.

摘要翻译： 这里公开了用于将显着权重分配给ASR模型的单词的系统，方法和非暂时计算机可读存储介质。分配给ASR模型中的单词的显着性值基于以前的成绩单的人类感知判断。这些显着性值被用作权重以修改ASR模型，使得将口头文档转换成抄本的加权ASR模型的结果为用户提供更准确和有用的转录。

5.

发明授权
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring 有权

公开(公告)号：US08548807B2

公开(公告)日：2013-10-01

申请号：US12480848

申请日：2009-06-09

申请人： Andrej Ljolje , Alistair D. Conkie , Ann K. Syrdal

发明人： Andrej Ljolje , Alistair D. Conkie , Ann K. Syrdal

IPC分类号： G10L15/04

CPC分类号： G10L17/14 , G10L15/063 , G10L15/07 , G10L15/14 , G10L15/187 , G10L15/265 , G10L15/30 , G10L2015/025

摘要： Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

6.

发明授权
System and method for standardized speech recognition infrastructure 有权

公开(公告)号：US08374867B2

公开(公告)日：2013-02-12

申请号：US12618371

申请日：2009-11-13

申请人： Andrej Ljolje , Bernard S. Renger , Steven Neil Tischer

发明人： Andrej Ljolje , Bernard S. Renger , Steven Neil Tischer

IPC分类号： G10L15/06

CPC分类号： G10L15/075 , G10L15/063 , G10L15/065 , G10L15/07 , G10L15/08

摘要： Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

7.

发明授权
System and method for supplemental speech recognition by identified idle resources 有权
标题翻译：通过识别的闲置资源补充语音识别的系统和方法

公开(公告)号：US08346549B2

公开(公告)日：2013-01-01

申请号：US12631131

申请日：2009-12-04

申请人： Andrej Ljolje , Mazin Gilbert

发明人： Andrej Ljolje , Mazin Gilbert

IPC分类号： G10L15/00

CPC分类号： G10L15/00 , G10L15/285 , G10L15/32

摘要： Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.

摘要翻译： 本文公开了用于改善自动语音识别性能的系统，方法和计算机可读存储介质。实施该方法的系统识别空闲语音识别资源，并且基于总体语音识别需求在空闲资源上建立补充语音识别器。补充语音识别器可以与主语音识别器不同，并且与主语音识别器一起可以与特定扬声器相关联。该系统与主语音识别器和辅助语音识别器并行地执行从特定扬声器接收的语音的语音识别，并且组合来自主语音识别器和补充语音识别器的结果。系统基于组合的结果识别接收到的语音。该系统可以使用波束调整来代替或与补充语音识别器组合。调度算法可以定制语音识别资源的特定组合，并且基于增加的需求来释放补充语音识别器。

8.

发明授权
Systems and methods of providing modified media content 有权
标题翻译：提供修改的媒体内容的系统和方法

公开(公告)号：US08312492B2

公开(公告)日：2012-11-13

申请号：US11725591

申请日：2007-03-19

申请人： Andrej Ljolje , Ann Syrdal , Alistair Conkie

发明人： Andrej Ljolje , Ann Syrdal , Alistair Conkie

IPC分类号： H04N7/173 , G06F15/00 , G10L11/00

CPC分类号： H04N21/6373 , H04N5/4401 , H04N5/765 , H04N5/775 , H04N5/783 , H04N7/56 , H04N9/8063 , H04N21/2335 , H04N21/234381 , H04N21/2393 , H04N21/4307 , H04N21/4325 , H04N21/47202 , H04N21/6587

摘要： A method and system of providing media content is disclosed. In a particular embodiment, the method includes receiving media content from a content source at a set-top box device. The media content includes video data having a first playback rate and audio data having the first playback rate. The method further includes transforming the audio data via a non-linear transformation to produce modified audio data having a second playback rate, modifying the video data to produce modified video data having the second playback rate, and synchronizing the modified audio data and the modified video data to produce modified media content having the second playback rate. A network-based media content storage device and associated logic to provide adjusted rate audio content are also disclosed.

摘要翻译： 公开了提供媒体内容的方法和系统。在特定实施例中，该方法包括在机顶盒设备处从内容源接收媒体内容。媒体内容包括具有第一播放速率的视频数据和具有第一播放速率的音频数据。该方法还包括经由非线性变换来变换音频数据以产生具有第二播放速率的修改的音频数据，修改视频数据以产生具有第二播放速率的修改的视频数据，以及使修改的音频数据和修改的视频同步数据以产生具有第二播放速率的修改的媒体内容。还公开了一种基于网络的媒体内容存储设备和相关逻辑以提供经调整的速率音频内容。

9.

发明授权
System and method of word lattice augmentation using a pre/post vocalic consonant distinction 有权
标题翻译：使用前/后声乐辅音区分的词格增强的系统和方法

公开(公告)号：US08024191B2

公开(公告)日：2011-09-20

申请号：US11930999

申请日：2007-10-31

申请人： Yeon-Jun Kim , Alistair Conkie , Andrej Ljolje , Ann K. Syrdal

发明人： Yeon-Jun Kim , Alistair Conkie , Andrej Ljolje , Ann K. Syrdal

IPC分类号： G10L15/04

CPC分类号： G10L25/78 , G10L15/02

摘要： Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.

摘要翻译： 提供了系统和方法来识别语音对话系统中的语音。该方法包括接收具有声前辅音或声后辅音的输入语音，通过将输入的语音与训练模型进行比较来产生至少一个输出格数，该输出格式通过比较输入语音来提供结果并区分前语音辅音和语音后辅音。通过测量输入语音中的声前辅音或声音后辅音与第一分数之间的相似度来计算第二分。通过使用第二分数来确定至少一个类别，用于通过使用第二分数进行语前匹配或不匹配或者后声匹配或不匹配，并且通过使用至少一个类别对自动语音识别（ASR）系统的结果进行改进，前声匹配或不匹配或后声匹配或不匹配。

10.

发明申请
MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS 有权
标题翻译：用于SPOKEN对话系统的多状态边界模型

公开(公告)号：US20090112599A1

公开(公告)日：2009-04-30

申请号：US11930619

申请日：2007-10-31

申请人： Andrej Ljolje

发明人： Andrej Ljolje

IPC分类号： G10L11/00 , G10L15/00

CPC分类号： G10L15/22 , G10L15/142 , G10L15/222

摘要： Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

摘要翻译： 公开了用于在口语对话系统中应用多状态插入声学模型的系统，方法和计算机可读介质，包括以下步骤：（1）从口头对话系统向用户呈现提示。（2）在呈现提示期间接收来自用户的音频语音输入，（3）累积从用户输入的音频语音，（4）应用具有至少两个一状态隐马尔可夫模型的非语音分量 HMM）到从用户输入的音频语音，（5）将具有至少五个三态HMM的语音分量应用于从用户输入的音频语音，其中五个三态HMM中的每一个表示不同的语音类别，（6）确定音频语音输入是否是来自用户的输入语音输入，以及（7）如果音频语音输入被确定为来自用户的语音输入输入，则终止呈现提示。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类