专利检索 cpc:"G10L2015/0633" 第 1 页

1.

发明授权
Automated domain-specific constrained decoding from speech inputs to structured resources 有权

公开(公告)号：US12094459B2

公开(公告)日：2024-09-17

申请号：US17568960

申请日：2022-01-05

申请人： International Business Machines Corporation

发明人： Ashish R Mittal , Samarth Bharadwaj , Shreya Khare , Karthik Sankaranarayanan

IPC分类号： G10L15/06 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/187 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10

CPC分类号： G10L15/187 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/063 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10 , G10L2015/0633 , G10L2015/0635 , G10L2015/223

摘要： Methods, systems, and computer program products for automated domain-specific constrained decoding from speech inputs to structured resources are provided herein. A computer-implemented method includes converting at least a portion of at least one user-provided speech utterance into text by processing the at least one user-provided speech utterance using an artificial intelligence-based automatic speech recognition model; automatically training an artificial intelligence-based decoding engine, wherein automatically training the artificial intelligence-based decoding engine comprising constraining the artificial intelligence-based decoding engine based at least in part on a domain-specific model and the artificial intelligence-based automatic speech recognition model; and generating at least one of one or more domain-specific text outputs related to one or more structured resources associated with the domain and one or more domain-specific action outputs related to the one or more structured resources associated with the domain by processing at least a portion of the text using the artificial intelligence-based decoding engine.

2.

发明公开
DELTA MODELS FOR PROVIDING PRIVATIZED SPEECH-TO-TEXT DURING VIRTUAL MEETINGS 审中-公开

公开(公告)号：US20230352026A1

公开(公告)日：2023-11-02

申请号：US17732876

申请日：2022-04-29

申请人： Zoom Video Communications, Inc.

发明人： Shane Paul Springer , Alexander Waibel

IPC分类号： G10L15/26 , G10L15/30 , G10L15/183 , G10L15/06

CPC分类号： G10L15/26 , G10L15/30 , G10L15/183 , G10L15/063 , G10L2015/0633

摘要： Provided herein are systems and methods for delta models for providing privatized speech-to-text during virtual meetings. In one embodiment, a system may include a non-transitory computer-readable medium; a communications interface; and a processor. The processor may be configured to execute processor-executable instructions to: join a virtual meeting. Each participant in the virtual meeting may exchange audio streams with other participants in the virtual meeting. The instructions may include receiving, from a video conference provider, a local model for speech recognition. The local model may be a copy of a centralized model. The instructions may include performing speech recognition using the local model on the audio streams. Performing speech recognition may include identifying audio feature data within the one or more audio streams, identifying, based on a vocabulary database, user-specific vocabulary within the audio feature data, and generating, based on the user-specific vocabulary, a private transcription of the audio streams.

3.

发明授权
Speech recognition method and apparatus 有权

公开(公告)号：US11664020B2

公开(公告)日：2023-05-30

申请号：US16908419

申请日：2020-06-22

申请人： ALIBABA GROUP HOLDING LIMITED

发明人： Xiaohui Li , Hongyan Li

IPC分类号： G10L15/00 , G10L15/187 , G10L15/30 , G10L15/26 , G10L15/02 , G10L15/06

CPC分类号： G10L15/187 , G10L15/02 , G10L15/063 , G10L15/26 , G10L15/30 , G10L2015/022 , G10L2015/0633

摘要： A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.

4.

发明申请
DYNAMIC LANGUAGE MODEL 审中-公开

公开(公告)号：US20190138539A1

公开(公告)日：2019-05-09

申请号：US16200531

申请日：2018-11-26

申请人： Google, LLC

发明人： Pedro J. Moreno Mengibar , Michael H. Cohen

IPC分类号： G06F16/338 , G06F16/33 , G10L15/00 , G06F16/21 , G06F16/29 , G10L15/14 , G10L15/26 , G10L15/197 , G10L15/24

CPC分类号： G06F16/338 , G06F16/211 , G06F16/29 , G06F16/3344 , G06F16/3346 , G10L15/005 , G10L15/14 , G10L15/197 , G10L15/24 , G10L15/26 , G10L15/265 , G10L2015/0633 , G10L2015/081 , G10L2015/228

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, the customized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the first word sequence having the adjusted probability value.

5.

发明授权
Name recognition system 有权

公开(公告)号：US10079014B2

公开(公告)日：2018-09-18

申请号：US15643741

申请日：2017-07-07

申请人： Apple Inc.

发明人： Devang K. Naik

IPC分类号： G10L15/00 , G10L15/04 , G10L15/26 , G10L15/06 , G10L15/18 , G10L21/00 , G10L25/00 , G06F17/27 , G06F17/21 , G10L15/187 , G10L15/30 , G10L15/02

CPC分类号： G10L15/187 , G10L15/30 , G10L2015/025 , G10L2015/0633

摘要： A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.

6.

发明申请
LANGUAGE MODELING BASED ON SPOKEN AND UNSPEAKABLE CORPUSES 审中-公开

公开(公告)号：US20170270912A1

公开(公告)日：2017-09-21

申请号：US15614283

申请日：2017-06-05

申请人： Microsoft Technology Licensing, LLC

发明人： Michael Levit , Shuangyu Chang , Benoit Dumoulin

IPC分类号： G10L15/06 , G10L15/18 , G10L15/19 , G10L15/10 , G10L15/14

CPC分类号： G10L15/063 , G10L15/10 , G10L15/14 , G10L15/18 , G10L15/19 , G10L2015/0633 , G10L2015/0635

摘要： A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.

7.

发明申请
Cross-Language Speech Recognition and Translation 审中-公开
标题翻译：跨语言语音识别与翻译

公开(公告)号：US20160336008A1

公开(公告)日：2016-11-17

申请号：US14714046

申请日：2015-05-15

申请人： Microsoft Technology Licensing, LLC

发明人： Arul A. Menezes , Hany M. Hassan Awadalla

IPC分类号： G10L15/187 , G10L15/06 , G06F17/28 , G10L15/19 , G06F17/27 , G10L13/08

CPC分类号： G10L15/187 , G06F17/278 , G06F17/2818 , G10L15/06 , G10L15/19 , G10L2015/0633

摘要： Technologies are described herein for cross-language speech recognition and translation. An example method of speech recognition and translation includes receiving an input utterance in a first language, the input utterance having at least one name of a named entity included therein and being pronounced in a second language, utilizing a customized language model to process at least a portion of the input utterance, and identifying the at least one name of the named entity from the input utterance utilizing a phonetic representation of the at least one name of the named entity. The phonetic representation has a pronunciation of the at least one name in the second language.

摘要翻译： 这里描述了用于跨语言语音识别和翻译的技术。语音识别和翻译的示例性方法包括以第一语言接收输入话语，输入话语具有包括在其中的命名实体的至少一个名称并以第二语言发音，利用定制语言模型来处理至少一个输入话语的一部分，以及利用所述命名实体的至少一个名称的语音表示，从所述输入话语中识别所述命名实体的所述至少一个名称。语音表示具有第二语言中至少一个名称的发音。

8.

发明授权
Machine learning dialect identification 有权
标题翻译：机器学习方言识别

公开(公告)号：US09477652B2

公开(公告)日：2016-10-25

申请号：US14621921

申请日：2015-02-13

申请人： Facebook, Inc.

发明人： Fei Huang

IPC分类号： G06F17/20 , G06F17/27 , G06F17/21 , G06F17/28 , G10L15/00 , G10L15/26

CPC分类号： G10L15/063 , G06F17/274 , G06F17/275 , G06F17/279 , G06F17/28 , G10L15/005 , G10L15/26 , G10L2015/0633 , G10L2015/0636

摘要： Technology is disclosed for creating and tuning classifiers for language dialects and for generating dialect-specific language modules. A computing device can receive an initial training data set as a current training data set. The selection process for the initial training data set can be achieved by receiving one or more initial content items, establishing dialect parameters of each of the initial content items, and sorting each of the initial content items into one or more dialect groups based on the established dialect parameters. The computing device can generate, based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified. The computing device can augment the current training data set with additional training data by applying the dialect classifier to candidate content items. The computing device can then update the dialect classifier based on the augmented current training data set.

摘要翻译： 公开了用于创建和调整用于语言方言的分类器和用于生成方言特定语言模块的技术。计算设备可以接收初始训练数据集作为当前训练数据集。初始训练数据集的选择过程可以通过接收一个或多个初始内容项目，建立每个初始内容项目的方言参数，并且基于所建立的内容项目将每个初始内容项目分类成一个或多个方言组来实现方言参数。计算设备可以基于初始训练数据集生成被配置为检测要分类的内容项的语言方言的方言分类器。计算设备可以通过将方言分类器应用于候选内容项来增加具有附加训练数据的当前训练数据集。然后，计算设备可以基于增强的当前训练数据集来更新方言分类器。

9.

发明授权
Automatic task classification based upon machine learning 有权
标题翻译：基于机器学习的自动任务分类

公开(公告)号：US09471887B2

公开(公告)日：2016-10-18

申请号：US14871595

申请日：2015-09-30

申请人： NTT DOCOMO Inc.

发明人： Hyung Sik Shin , Ronald Sujithan , Sayandev Mukherjee , Hongfeng Yin , Yang Sun , Yoshikazu Akinaga , Pero Subasic

IPC分类号： G06N99/00 , G06F15/18 , G10L15/00 , G10L15/18 , G06F17/30 , G06F9/48 , G10L15/06

CPC分类号： G06N99/005 , G06F9/4881 , G06F15/18 , G06F17/30654 , G10L15/00 , G10L15/1822 , G10L2015/0633

摘要： A system and method is provided that processes a training database of human-generated requests in each of a plurality of task categories with a machine learning algorithm to develop a task classifier model that may be applied to subsequent user requests to determine the most likely one of the task categories for the subsequent user request.

摘要翻译： 提供了一种系统和方法，其利用机器学习算法处理多个任务类别中的每一个中的人产生请求的训练数据库，以开发可应用于后续用户请求的任务分类器模型，以确定最可能的一个后续用户请求的任务类别。

10.

发明申请
Discriminative Training of Document Transcription System 有权

公开(公告)号：US20160078861A1

公开(公告)日：2016-03-17

申请号：US14942349

申请日：2015-11-16

申请人： MModal IP LLC

发明人： Lambert Mathias , Girija Yegnanarayanan , Juergen Fritsch

IPC分类号： G10L15/06 , G06F17/28 , G10L15/02 , G06F17/27

CPC分类号： G10L15/063 , G06F17/271 , G06F17/2775 , G06F17/28 , G10L15/02 , G10L15/183 , G10L15/193 , G10L15/26 , G10L2015/0631 , G10L2015/0633

摘要： A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类