专利检索 ap:("Alejandro Acero" OR "Hsiao-Wuen Hon" OR "Xuedong D. Huang") AND inv:"Xuedong D. Huang" 第 1 页

1.

发明授权
Text-to-speech using clustered context-dependent phoneme-based units 失效
标题翻译：使用基于上下文的基于音素的单元的文本到语音

公开(公告)号：US6163769A

公开(公告)日：2000-12-19

申请号：US949138

申请日：1997-10-02

申请人： Alejandro Acero , Hsiao-Wuen Hon , Xuedong D. Huang

发明人： Alejandro Acero , Hsiao-Wuen Hon , Xuedong D. Huang

IPC分类号： G10L13/06 , G10L13/00

CPC分类号： G10L13/07

摘要： A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.

摘要翻译： 文本到语音系统包括用于存储目标说话者的基于上下文的基于音素的单元的聚集集合的存储设备。在一个实施例中，使用决策树，其中基于上下文的基于音素的单元的每个基于决策树的单元基于至少一个紧接在前和后面的音素的上下文来排列。基于上下文的基于音素的单元中的至少一个单元表示由于类似的上下文而具有类似声音的其他未存储的上下文相关音素单元。文本分析器获得代表要转换为语音的文本的语音符号串。级联模块基于语音符号的上下文从基于上下文的基于音素的单元中选择存储的基于决策树的基于上下文的基于音素的基于单元的基于上下文的基于音素的单元，并且合成所选择的基于音素的单元以产生对应于文本。

2.

发明授权
Method and system for dynamically adjusted training for speech recognition 失效
标题翻译：用于语音识别的动态调整训练的方法和系统

公开(公告)号：US5963903A

公开(公告)日：1999-10-05

申请号：US673435

申请日：1996-06-28

申请人： Hsiao-Wuen Hon , Xuedong D. Huang , Mei-Yuh Hwang , Li Jiang , Yun-Cheng Ju , Milind V. Mahajan , Michael J. Rozak

发明人： Hsiao-Wuen Hon , Xuedong D. Huang , Mei-Yuh Hwang , Li Jiang , Yun-Cheng Ju , Milind V. Mahajan , Michael J. Rozak

IPC分类号： G10L15/02 , G10L15/06 , G10L15/14 , G10L5/04

CPC分类号： G10L15/063 , G10L2015/0635

摘要： A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.

摘要翻译： 一种用于动态选择用于训练语音识别系统的单词的方法和系统。语音识别系统使用隐马尔科夫模型对每个音素进行建模，并将每个单词表示为音素序列。训练系统根据将相应的码字作为音素的一部分被说出的概率，对每个帧的每个音素进行排序。训练系统收集对应词语已知的口语说话。然后，训练系统将每个话语的码字与被认为是其一部分的音素对齐。训练系统然后使用对齐的帧的对齐码字来计算每个音素的平均等级。最后，训练系统选择包含低等级音素的训练词。

3.

发明授权
Method and system for correcting misrecognized spoken words or phrases 失效
标题翻译：用于纠正错误识别的口头单词或短语的方法和系统

公开(公告)号：US5829000A

公开(公告)日：1998-10-27

申请号：US741696

申请日：1996-10-31

申请人： Xuedong D. Huang , Hsiao-Wuen Hon , Li Jiang

发明人： Xuedong D. Huang , Hsiao-Wuen Hon , Li Jiang

IPC分类号： G10L15/06 , G10L15/22 , G01L5/06

CPC分类号： G10L15/22

摘要： A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. This elimination occurs based on the probabilities of alternative words associated with both the misrecognized utterance and the respoken utterance. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor. The system also uses a word correction metaphor or a phrase correction metaphor.

摘要翻译： 用于编辑错误识别的单词的方法和系统。该系统允许扬声器通过调整校正窗口的大小来指定要在校正窗口中显示的替代单词的数量。系统还会按字母顺序显示校正窗口中的单词。一个首选的系统消除了当一个错误识别的话被重申时，这个可重复发音将被再次被认为是同一个错误识别的单词的可能性。这种消除是基于与错误识别的话语和呼出话语相关联的替代词的概率。当使用文字处理器进行操作时，该系统允许扬声器指定在传送到文字处理器之前缓冲的语音量。该系统还使用单词修正隐喻或短语校正隐喻。

4.

发明授权
Multi-sensory speech detection system 失效
标题翻译：多感官语音检测系统

公开(公告)号：US07383181B2

公开(公告)日：2008-06-03

申请号：US10629278

申请日：2003-07-29

申请人： Xuedong D. Huang , Zicheng Liu , Zhengyou Zhang , Michael J. Sinclair , Alejandro Acero

发明人： Xuedong D. Huang , Zicheng Liu , Zhengyou Zhang , Michael J. Sinclair , Alejandro Acero

IPC分类号： G10L15/00

CPC分类号： H04R1/10 , G10L15/20 , G10L15/24 , G10L25/78 , H04R1/14 , H04R25/606

摘要： The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.

摘要翻译： 本发明将常规音频麦克风与基于输入提供语音传感器信号的附加话音传感器组合。语音传感器信号基于语音中的扬声器在诸如面部运动，骨骼振动，喉部振动，喉部阻抗变化等中的动作而产生。语音检测器组件从语音传感器接收输入并输出语音检测指示用户是否正在说话的信号。语音检测器基于麦克风信号和语音传感器信号产生语音检测信号。

5.

发明授权
Method and apparatus for multi-sensory speech enhancement 有权
标题翻译：多感官语音增强的方法和装置

公开(公告)号：US07447630B2

公开(公告)日：2008-11-04

申请号：US10724008

申请日：2003-11-26

申请人： Zicheng Liu , Michael J. Sinclair , Alejandro Acero , Xuedong D. Huang , James G. Droppo , Li Deng , Zhengyou Zhang , Yanli Zheng

发明人： Zicheng Liu , Michael J. Sinclair , Alejandro Acero , Xuedong D. Huang , James G. Droppo , Li Deng , Zhengyou Zhang , Yanli Zheng

IPC分类号： G10L21/02

CPC分类号： G10L21/0208 , G10L2021/02165

摘要： A method and system use an alternative sensor signal received from a sensor other than an air conduction microphone to estimate a clean speech value. The estimation uses either the alternative sensor signal alone, or in conjunction with the air conduction microphone signal. The clean speech value is estimated without using a model trained from noisy training data collected from an air conduction microphone. Under one embodiment, correction vectors are added to a vector formed from the alternative sensor signal in order to form a filter, which is applied to the air conductive microphone signal to produce the clean speech estimate. In other embodiments, the pitch of a speech signal is determined from the alternative sensor signal and is used to decompose an air conduction microphone signal. The decomposed signal is then used to determine a clean signal estimate.

摘要翻译： 一种方法和系统使用从除空气传导麦克风以外的传感器接收的替代传感器信号来估计干净的语音值。该估计单独使用替代传感器信号，或者与导气麦克风信号一起使用。无需使用从空气传导麦克风收集的噪声训练数据训练的模型来估计干净的语音值。在一个实施例中，校正矢量被添加到由替代传感器信号形成的矢量中，以形成滤波器，该滤波器被施加到空气传导麦克风信号以产生干净的语音估计。在其他实施例中，语音信号的音调由替代传感器信号确定，并用于分解空气传导麦克风信号。然后使用分解的信号来确定干净的信号估计。

6.

发明授权
Method and system of runtime acoustic unit selection for speech synthesis 失效
标题翻译：用于语音合成的运行时音单元选择的方法和系统

公开(公告)号：US5913193A

公开(公告)日：1999-06-15

申请号：US648808

申请日：1996-04-30

申请人： Xuedong D. Huang , Michael D. Plumpe , Alejandro Acero , James L. Adcock

发明人： Xuedong D. Huang , Michael D. Plumpe , Alejandro Acero , James L. Adcock

IPC分类号： G06F3/16 , G10L13/06 , G10L13/08 , G10L5/02 , G10L9/00

CPC分类号： G10L13/07

摘要： The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.

摘要翻译： 本发明涉及一种产生更自然的声音语音的级联语音合成系统和方法。该系统提供每个声学单元的多个实例，其可用于生成表示语言表达式的语音波形。多个实例在合成过程的分析或训练阶段期间形成，并且被限制为最高概率实例的鲁棒表示。提供多个实例使得合成器能够选择非常类似于期望实例的实例，从而消除了改变存储的实例以匹配所需实例的需要。这实质上使相邻实例的边界之间的频谱失真最小化，从而产生更自然的声音语音。

7.

发明授权
Architecture for user- and context-specific prefetching and caching of information on portable devices 有权
标题翻译：用于便携式设备上的用户和上下文相关预取和缓存信息的体系结构

公开(公告)号：US08626136B2

公开(公告)日：2014-01-07

申请号：US11427755

申请日：2006-06-29

申请人： Raymond E. Ozzie , Eric J. Horvitz , William H. Gates, III , Joshua T. Goodman , Susan T. Dumais , Gary W. Flake , Trenholme J. Griffin , Xuedong D. Huang , Oliver Hurst-Hiller , Christopher A Meek

发明人： Raymond E. Ozzie , Eric J. Horvitz , William H. Gates, III , Joshua T. Goodman , Susan T. Dumais , Gary W. Flake , Trenholme J. Griffin , Xuedong D. Huang , Oliver Hurst-Hiller , Christopher A Meek

IPC分类号： H04M3/42 , H04L29/06 , H04W74/00 , G06F15/16 , G06F3/033

CPC分类号： G06F17/30867 , H04W4/02 , H04W4/18

摘要： Content management architecture for a portable wireless device. Caching and fetching techniques are provided to improve content handling for portable devices such as cellular telephones and portable computers. A search component automatically performs searches as a background process, and potentially desired content is received and cached by a content storing component to be available in the future when and if needed, mitigating latency associated with slow download speeds, refresh rates, and other system and/or network impediments. Content from background search results can be trickled into the device as part of the background process so as not to burden system resources for other processes. As part of memory management, aged and/or low priority or low interest content can be selectively removed or archived to increase available cache or memory space, as well as to maintain relevant content within the device. A presentation component facilitates presentation of the pre-stored content.

摘要翻译： 便携式无线设备的内容管理架构。提供缓存和提取技术以改进便携式设备（例如蜂窝电话和便携式计算机）的内容处理。搜索组件自动执行搜索作为后台进程，并且可能期望的内容被内容存储组件接收和缓存，以便将来在需要时可用，减轻与慢下载速度，刷新率和其他系统相关联的延迟，以及 /或网络障碍。来自后台搜索结果的内容可以作为后台进程的一部分进入设备，以免对其他进程造成系统资源的负担。作为内存管理的一部分，老化和/或低优先级或低兴趣内容可以被选择性地删除或归档以增加可用的高速缓存或存储器空间，以及维护设备内的相关内容。演示组件便于显示预存的内容。

8.

发明申请
FORCE-FEEDBACK WITHIN TELEPRESENCE 有权
标题翻译：电报中的反馈

公开(公告)号：US20100306647A1

公开(公告)日：2010-12-02

申请号：US12472579

申请日：2009-05-27

申请人： Zhengyon Zhang , Xuedong D. Huang , Jin Li , Rajesh Kutpadi Hegde , Kori Marie Quinn , Michel Pahud , Jayman Dalal

发明人： Zhengyon Zhang , Xuedong D. Huang , Jin Li , Rajesh Kutpadi Hegde , Kori Marie Quinn , Michel Pahud , Jayman Dalal

IPC分类号： G06F3/01 , G06F3/048

CPC分类号： G06F3/016

摘要： The claimed subject matter provides a system and/or a method that facilitates replicating a telepresence session with a real world physical meeting. A telepresence session can be initiated within a communication framework that includes two or more virtually represented users that communicate therein. A trigger component can monitor the telepresence session in real time to identify a participant interaction with an object, wherein the object is at least one of a real world physical object or a virtually represented object within the telepresence session. A feedback component can implement a force feedback to at least one participant within the telepresence session based upon the identified participant interaction with the object, wherein the force feedback is employed via a device associated with at least one participant.

摘要翻译： 所要求保护的主题提供了一种有助于利用真实世界物理会议复制远程呈现会话的系统和/或方法。可以在通信框架内启动远程呈现会话，该通信框架包括在其中通信的两个或更多虚拟表示的用户。触发组件可以实时地监视远程呈现会话，以识别与对象的参与者交互，其中对象是远程呈现会话中的真实世界物理对象或虚拟表示对象中的至少一个。基于所识别的参与者与对象的交互，反馈组件可以向远程呈现会话中的至少一个参与者实施强制反馈，其中通过与至少一个参与者相关联的设备来采用力反馈。

9.

发明授权
Entity-specific search model 有权
标题翻译：实体特定搜索模型

公开(公告)号：US07822762B2

公开(公告)日：2010-10-26

申请号：US11427311

申请日：2006-06-28

申请人： Christopher D. Payne , Eric J. Horvitz , Alexander G. Gounares , Susan T. Dumais , Kyle G. Peltonen , Gary W. Flake , Xuedong D. Huang , William H. Gates, III , John C. Platt , Oliver Hurst-Hiller , Joshua T. Goodman , Christopher A. Meek , Ramez Naam , Raymond E Ozzie , Eric D. Brill

发明人： Christopher D. Payne , Eric J. Horvitz , Alexander G. Gounares , Susan T. Dumais , Kyle G. Peltonen , Gary W. Flake , Xuedong D. Huang , William H. Gates, III , John C. Platt , Oliver Hurst-Hiller , Joshua T. Goodman , Christopher A. Meek , Ramez Naam , Raymond E Ozzie , Eric D. Brill

IPC分类号： G06F7/001

CPC分类号： G06F17/30967 , G06F17/30964

摘要： A system that employs an explicitly and/or implicitly trained model in order to return entity-specific computer-based search results is provided. The innovation can provide for a customized search model that focuses search in connection with achieving information that is meaningful with respect to goals of an entity. The model can be used to modify a search query in accordance with a goal of the entity or to generate the search query thereby returning meaningful and/or targeted results to the user. The system can automatically gather entity-related data thereafter determining or inferring a goal as well as training the model. Moreover, the system can selectively configure (e.g., order, rank, filter) and render results to a user based upon the model.

摘要翻译： 提供了一种采用明确和/或隐含训练的模型以返回基于实体的基于计算机的搜索结果的系统。该创新可以提供定制的搜索模型，其将搜索重点与获得关于实体目标有意义的信息相关联。该模型可以用于根据实体的目标修改搜索查询，或者生成搜索查询，从而向用户返回有意义和/或有针对性的结果。系统可以自动收集与实体有关的数据，然后确定或推断目标以及训练模型。此外，系统可以基于模型选择性地配置（例如，排序，排序，过滤）并将结果呈现给用户。

10.

发明授权
Use of a unified language model 失效
标题翻译：使用统一的语言模型

公开(公告)号：US07013265B2

公开(公告)日：2006-03-14

申请号：US11003121

申请日：2004-12-03

申请人： Xuedong D. Huang , Milind V. Mahajan , Ye-Yi Wang , Xiaolong Mou

发明人： Xuedong D. Huang , Milind V. Mahajan , Ye-Yi Wang , Xiaolong Mou

IPC分类号： G06F17/27 , G10L15/18 , G10L11/00

CPC分类号： G10L15/193 , G10L15/197

摘要： A language processing system includes a unified language model. The unified language model comprises a plurality of context-free grammars having non-terminal tokens representing semantic or syntactic concepts and terminals, and an N-gram language model having non-terminal tokens. A language processing module capable of receiving an input signal indicative of language accesses the unified language model to recognize the language. The language processing module generates hypotheses for the received language as a function of words of the unified language model and/or provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

摘要翻译： 语言处理系统包括统一的语言模型。统一语言模型包括具有表示语义或句法概念和终端的非终端令牌的多个无上下文语法，以及具有非终端令牌的N-gram语言模型。能够接收指示语言的输入信号的语言处理模块访问统一语言模型以识别语言。语言处理模块根据统一语言模型的单词生成接收到的语言的假设和/或提供指示语言的输出信号以及其中包含的至少一些语义或句法概念。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类