专利检索 cpc:"G10L2015/0633" 第 4 页

31.

发明申请
METHOD FOR BUILDING ACOUSTIC MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS 审中-公开
标题翻译：建立声学模型，语音识别方法和电子设备的方法

公开(公告)号：US20150112674A1

公开(公告)日：2015-04-23

申请号：US14490676

申请日：2014-09-19

申请人： VIA Technologies, Inc.

发明人： Guo-Feng Zhang , Yi-Fei Zhu

IPC分类号： G10L15/06 , G10L25/33 , G10L15/00 , G10L15/26

CPC分类号： G10L15/063 , G10L25/33 , G10L2015/0633

摘要： A method for building acoustic model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. A plurality of phonetic transcriptions of a speech signal is obtained from an acoustic model. A plurality of vocabularies matching the phonetic transcriptions are obtained according to each phonetic transcription and a syllable acoustic lexicon, wherein the syllable acoustic lexicon includes the vocabularies corresponding to the phonetic transcription, and the vocabulary having at least one phonetic transcription includes a code corresponding to the phonetic transcription. A plurality of strings and a plurality of string probabilities are obtained from a language model according to the code of each of the vocabularies.

摘要翻译： 提供了建立声学模型的方法，语音识别方法和电子设备。语音识别方法包括以下步骤。从声学模型获得语音信号的多个语音转录。根据每个语音转录和音节声学词典获得与语音转录匹配的多个词汇，其中，音节声学词典包括对应于语音转录的词汇，并且具有至少一个语音转录的词汇包括对应于语音转录。根据每个词汇的代码从语言模型获得多个字符串和多个字符串概率。

32.

发明申请
NAMED-ENTITY BASED SPEECH RECOGNITION 审中-公开
标题翻译：基于名称的实体语音识别

公开(公告)号：US20150088511A1

公开(公告)日：2015-03-26

申请号：US14035845

申请日：2013-09-24

申请人： Verizon Patent and Licensing Inc.

发明人： Sujeeth S. Bharadwaj , Suri B. Medapati

IPC分类号： G10L15/187 , G10L15/06 , G10L15/22

CPC分类号： G10L15/183 , G06F17/278 , G10L2015/0633

摘要： In embodiments, apparatuses, methods and storage media are described that are associated with recognition of speech based on sequences of named entities. Language models may be trained as being associated with sequences of named entities. A language model may be selected for speech recognition after identification of one or more sequences of named entities by an initial language model. After identification of the one or more sequences of named entities, weights may be assigned to the one or more sequences of named entities. These weights may be utilized to select a language module and/or update the initial language model to one that is associated with the identified one or more sequences of named entities. In various embodiments, the language model may be repeatedly updated until the recognized speech converges sufficiently to satisfy a predetermined threshold. Other embodiments may be described and claimed.

摘要翻译： 在实施例中，描述了基于命名实体的序列与语音识别相关联的装置，方法和存储介质。语言模型可以被训练为与命名实体的序列相关联。在通过初始语言模型识别一个或多个命名实体序列之后，可以选择语言模型用于语音识别。在识别命名实体的一个或多个序列之后，可以将权重分配给命名实体的一个或多个序列。这些权重可以用于选择语言模块和/或将初始语言模型更新为与所标识的一个或多个命名实体序列相关联的模型。在各种实施例中，可以重复地更新语言模型，直到所识别的语音充分收敛以满足预定阈值。可以描述和要求保护其他实施例。

33.

发明授权
Object classification/recognition apparatus and method 有权
标题翻译：对象分类/识别装置及方法

公开(公告)号：US08873868B2

公开(公告)日：2014-10-28

申请号：US13724220

申请日：2012-12-21

申请人： Honda Motor Co., Ltd. , National University Corporation Kobe University

发明人： Mikio Nakano , Naoto Iwahashi , Yasuo Ariki , Yuko Ozasa , Takahiro Hori , Ryohei Nakatani

IPC分类号： G06K9/62 , G10L15/02

CPC分类号： G06K9/6267 , G06K9/6254 , G06K9/6277 , G06K9/6293 , G10L15/01 , G10L2015/025 , G10L2015/0633

摘要： An apparatus is provided for classifying targets into a known-object group and an unknown-object group. The apparatus includes a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a spoken sound of a name of a known object; a unit configured to calculate an image confidence level of an image of an object with respect to an image of a known object; and a unit configured to compare an evaluation value, which is obtained by combining the speech confidence level and image confidence level, with a threshold value, and classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.

摘要翻译： 提供了一种用于将目标分类为已知对象组和未知对象组的装置。该装置包括：语音/图像数据存储单元，被配置为存储对象的名称和对象的图像的口语声音; 参考已知对象的名称的口语声音，被配置为针对对象的名称计算语音的语音置信水平的单元; 被配置为计算相对于已知对象的图像的对象的图像的图像置信水平的单元; 以及被配置为将通过组合语音置信度和图像置信水平而获得的评估值与阈值进行比较的单元，并且将目标对象分类为根据姓名的语音确定的对象组和图像是已知或未知的。

34.

发明申请
METHOD AND SYSTEM FOR CONSTRUCTING A LANGUAGE MODEL 有权
标题翻译：用于构建语言模型的方法和系统

公开(公告)号：US20130179151A1

公开(公告)日：2013-07-11

申请号：US13732445

申请日：2013-01-02

申请人： Yactraq Online Inc.

发明人： Lee Allan Iverson

IPC分类号： G06F17/28

CPC分类号： G10L15/063 , G06F17/27 , G06F17/28 , G10L15/1815 , G10L15/19 , G10L15/265 , G10L2015/0633 , G10L2015/088

摘要： Disclosed herein are various embodiments of methods and systems for constructing a first language model for use by a first Language Processing (LP) application of a plurality of LP applications. Each LP application of the plurality of LP applications receives one or more of a language based input, a derivative of the language based input, a response to the language based input and a derivative of the response. The method includes processing at least one input by a second LP application of the plurality of LP applications. Based on the processing of the second LP application, at least one output is generated. Subsequently, at least a portion of the first language model is constructed based on the at least one output.

摘要翻译： 本文公开了用于构建由多个LP应用的第一语言处理（LP）应用使用的第一语言模型的方法和系统的各种实施例。多个LP应用程序的每个LP应用程序接收基于语言的输入，基于语言的输入的导数，对基于语言的输入的响应和响应的导数中的一个或多个。该方法包括通过多个LP应用的第二LP应用处理至少一个输入。基于第二LP应用程序的处理，生成至少一个输出。随后，基于至少一个输出构建第一语言模型的至少一部分。

35.

发明申请
SYSTEM AND METHOD FOR SPEECH-ENABLED ACCESS TO MEDIA CONTENT 有权
标题翻译：用于语音访问的媒体内容的系统和方法

公开(公告)号：US20110082696A1

公开(公告)日：2011-04-07

申请号：US12573448

申请日：2009-10-05

申请人： Michael JOHNSTON , Ebrahim KAZEMZADEH

发明人： Michael JOHNSTON , Ebrahim KAZEMZADEH

IPC分类号： G10L15/06 , G06T11/20

CPC分类号： G06F7/08 , G06F17/30026 , G06F17/30743 , G06F17/30784 , G10L15/06 , G10L15/063 , G10L15/08 , G10L15/197 , G10L15/265 , G10L2015/0633 , H04N5/445 , H04N21/42203 , H04N21/4662 , H04N21/4668 , H04N21/47214 , H04N21/4828 , H04N21/84

摘要： Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.

摘要翻译： 本文公开了用于生成用于媒体内容检索系统的语音识别模型的系统，方法和计算机可读存储介质。该方法使得计算设备检索描述媒体内容检索系统中可用媒体的信息，根据检索到的信息构建模型介质如何相互连接的图形，根据图表对描述媒体的信息进行排序，并产生语音基于排名信息的识别模型。信息可以是演员，导演，作曲家，头衔和/或地点的列表。模拟媒体如何互连的图形可以在两个或多个媒体之间进一步建模公共信息。该方法还可以使得计算设备基于检索到的信息来加权图。该图可以进一步模拟列表中的相对流行度信息。该方法可以基于PageRank算法对信息进行排序。

36.

发明申请
Method for creating a data structure, in particular of phonetic transcriptions for a voice-controlled navigation system 审中-公开
标题翻译：用于创建数据结构的方法，特别是用于语音控制的导航系统的语音转录

公开(公告)号：US20030125941A1

公开(公告)日：2003-07-03

申请号：US10256396

申请日：2002-09-27

发明人： Ulrich Gaertner , Katja Kunitz

IPC分类号： G06F007/00

CPC分类号： G10L15/30 , G01C21/3608 , G10L15/063 , G10L2015/0633

摘要： A method for recognizing a voice input, in particular of a spoken description, such as a place name, where, from a voice input, a voice signal is generated; from a total set of phonetic transcriptions, subsets are created, whose elements each fulfill one criterion; by intersecting the subsets, a cut set is created, whose element number does not exceed a predefined comparison value; the elements of this cut set are compared to the voice signal; and, given a phonetic similarity with one of the elements of the cut set, the voice signal is allocated thereto. Also described is a device for this purpose. The method and device described herein permit a voice input to be recognized and allocated to a geographic designation, without the need for any manual operation.

摘要翻译： 用于识别语音输入的方法，特别是用于从语音输入生成语音信号的语音描述（例如地名）; 从一整套语音转录中，创建子集，其元素各自满足一个标准; 通过与子集相交，创建剪切集合，其元素号不超过预定义的比较值; 将该切割组的元素与语音信号进行比较; 并且给定与剪切集合的元素之一的语音相似性，则分配语音信号。还描述了用于此目的的装置。本文描述的方法和装置允许将语音输入识别并分配给地理名称，而不需要任何手动操作。

37.

发明授权
Verbal, fully automatic dictionary updates by end-users of speech synthesis and recognition systems 失效
标题翻译：语音综合和识别系统的最终用户的语言，全自动字典更新

公开(公告)号：US6078885A

公开(公告)日：2000-06-20

申请号：US75162

申请日：1998-05-08

申请人： Mark C. Beutnagel

发明人： Mark C. Beutnagel

IPC分类号： G10L13/08

CPC分类号： G10L15/063 , G10L2015/025 , G10L2015/0633

摘要： A method and system that allows users, or maintainers, of a speech-based application to revise the phonetic transcription of words in a phonetic dictionary, or to add transcriptions for words not yet present in the dictionary. The application is assumed to communicate with the user or maintainer audibly by means of speech recognition and/or speech synthesis systems, both of which rely on a dictionary of phonetic transcriptions to accurately recognize speech and pronunciation of a given word. The method automatically determines the phonetic transcription based on the word's spelling and the recorded preferred pronunciation, and updates the dictionary accordingly. Moreover, both speech synthesis and recognition performance are improved through use of the updated dictionary.

摘要翻译： 允许用户或维护者使用基于语音的应用程序来修改语音字典中的单词的语音转录的方法和系统，或者为字典中尚未存在的单词添加转录。假设应用程序通过语音识别和/或语音合成系统可听见地与用户或维护者通信，两者都依赖于语音转录词典来精确地识别给定单词的语音和发音。该方法根据单词的拼写和记录的首选发音自动确定语音转录，并相应地更新字典。此外，通过使用更新的字典来提高语音合成和识别性能。

38.

发明公开
END-TO-END AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER 审中-公开

公开(公告)号：US20240331685A1

公开(公告)日：2024-10-03

申请号：US18129996

申请日：2023-04-03

申请人： Deepgram, Inc.

发明人： Andrew Nathan Seagraves , Deepak Subburam , Adam Joseph Sypniewski , Scott Ivan Stephenson , Jacob Edward Cutter , Michael Joseph Sypniewski , Daniel Lewis Shafer

IPC分类号： G10L15/16 , G10L15/02 , G10L15/06

CPC分类号： G10L15/16 , G10L15/02 , G10L15/063 , G10L2015/025 , G10L2015/0633

摘要： An end-to-end automatic speech recognition (ASR) system can be constructed by fusing a first ASR model with a transformer. The input of the transformer is a learned layer generated by the first ASR model. The fused ASR model and transformer can be treated as a single end-to-end model and trained as a single model. In some embodiments, the end-to-end speech recognition system can be trained using a teacher-student training technique by selectively truncating portions of the first ASR model and/or the transformer components and selectively freezing various layers during the training passes.

39.

发明授权
Lexicon learning-based heliumspeech unscrambling method in saturation diving 有权

公开(公告)号：US12094482B2

公开(公告)日：2024-09-17

申请号：US18427869

申请日：2024-01-31

申请人： Nantong University

发明人： Shibing Zhang , Jianrong Wu , Lili Guo , Ming Li , Zhihua Bao

IPC分类号： G10L21/00 , G10L15/06 , G10L15/16 , G10L21/0208 , G10L25/51

CPC分类号： G10L21/0208 , G10L15/063 , G10L15/16 , G10L25/51 , G10L2015/0633

摘要： The present application relates to a lexicon learning-based heliumspeech unscrambling method in saturation diving. In a system including divers, a correction network, and an unscrambling network, a common working language lexicon for saturation diving operation is established and is read by the divers respectively in different environments, to generate supervision signals and vector signals of the correction network, and the correction network learns heliumspeeches of the different divers at different diving depths to obtain a correction network parameter, and corrects a heliumspeech of a diver to obtain a corrected speech; and the unscrambling network learns the corrected speech and completes unscrambling of the heliumspeech.

40.

发明授权
Noise data augmentation for natural language processing 有权

公开(公告)号：US11972755B2

公开(公告)日：2024-04-30

申请号：US17993130

申请日：2022-11-23

申请人： Oracle International Corporation

发明人： Elias Luqman Jalaluddin , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Yu-Heng Hong , Balakota Srinivas Vinnakota

IPC分类号： G10L15/22 , G10L15/05 , G10L15/06 , G10L15/18 , G10L15/26

CPC分类号： G10L15/063 , G10L15/05 , G10L15/18 , G10L15/22 , G10L15/26 , G10L2015/0633 , G10L2015/0638 , G10L2015/227

摘要： Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类