-
1.
公开(公告)号:US12094459B2
公开(公告)日:2024-09-17
申请号:US17568960
申请日:2022-01-05
IPC分类号: G10L15/06 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/187 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10
CPC分类号: G10L15/187 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/063 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10 , G10L2015/0633 , G10L2015/0635 , G10L2015/223
摘要: Methods, systems, and computer program products for automated domain-specific constrained decoding from speech inputs to structured resources are provided herein. A computer-implemented method includes converting at least a portion of at least one user-provided speech utterance into text by processing the at least one user-provided speech utterance using an artificial intelligence-based automatic speech recognition model; automatically training an artificial intelligence-based decoding engine, wherein automatically training the artificial intelligence-based decoding engine comprising constraining the artificial intelligence-based decoding engine based at least in part on a domain-specific model and the artificial intelligence-based automatic speech recognition model; and generating at least one of one or more domain-specific text outputs related to one or more structured resources associated with the domain and one or more domain-specific action outputs related to the one or more structured resources associated with the domain by processing at least a portion of the text using the artificial intelligence-based decoding engine.
-
公开(公告)号:US20230352026A1
公开(公告)日:2023-11-02
申请号:US17732876
申请日:2022-04-29
IPC分类号: G10L15/26 , G10L15/30 , G10L15/183 , G10L15/06
CPC分类号: G10L15/26 , G10L15/30 , G10L15/183 , G10L15/063 , G10L2015/0633
摘要: Provided herein are systems and methods for delta models for providing privatized speech-to-text during virtual meetings. In one embodiment, a system may include a non-transitory computer-readable medium; a communications interface; and a processor. The processor may be configured to execute processor-executable instructions to: join a virtual meeting. Each participant in the virtual meeting may exchange audio streams with other participants in the virtual meeting. The instructions may include receiving, from a video conference provider, a local model for speech recognition. The local model may be a copy of a centralized model. The instructions may include performing speech recognition using the local model on the audio streams. Performing speech recognition may include identifying audio feature data within the one or more audio streams, identifying, based on a vocabulary database, user-specific vocabulary within the audio feature data, and generating, based on the user-specific vocabulary, a private transcription of the audio streams.
-
公开(公告)号:US11664020B2
公开(公告)日:2023-05-30
申请号:US16908419
申请日:2020-06-22
发明人: Xiaohui Li , Hongyan Li
CPC分类号: G10L15/187 , G10L15/02 , G10L15/063 , G10L15/26 , G10L15/30 , G10L2015/022 , G10L2015/0633
摘要: A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.
-
公开(公告)号:US20190138539A1
公开(公告)日:2019-05-09
申请号:US16200531
申请日:2018-11-26
申请人: Google, LLC
IPC分类号: G06F16/338 , G06F16/33 , G10L15/00 , G06F16/21 , G06F16/29 , G10L15/14 , G10L15/26 , G10L15/197 , G10L15/24
CPC分类号: G06F16/338 , G06F16/211 , G06F16/29 , G06F16/3344 , G06F16/3346 , G10L15/005 , G10L15/14 , G10L15/197 , G10L15/24 , G10L15/26 , G10L15/265 , G10L2015/0633 , G10L2015/081 , G10L2015/228
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, the customized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the first word sequence having the adjusted probability value.
-
公开(公告)号:US10079014B2
公开(公告)日:2018-09-18
申请号:US15643741
申请日:2017-07-07
申请人: Apple Inc.
发明人: Devang K. Naik
IPC分类号: G10L15/00 , G10L15/04 , G10L15/26 , G10L15/06 , G10L15/18 , G10L21/00 , G10L25/00 , G06F17/27 , G06F17/21 , G10L15/187 , G10L15/30 , G10L15/02
CPC分类号: G10L15/187 , G10L15/30 , G10L2015/025 , G10L2015/0633
摘要: A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.
-
公开(公告)号:US20170270912A1
公开(公告)日:2017-09-21
申请号:US15614283
申请日:2017-06-05
发明人: Michael Levit , Shuangyu Chang , Benoit Dumoulin
CPC分类号: G10L15/063 , G10L15/10 , G10L15/14 , G10L15/18 , G10L15/19 , G10L2015/0633 , G10L2015/0635
摘要: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.
-
公开(公告)号:US20160336008A1
公开(公告)日:2016-11-17
申请号:US14714046
申请日:2015-05-15
CPC分类号: G10L15/187 , G06F17/278 , G06F17/2818 , G10L15/06 , G10L15/19 , G10L2015/0633
摘要: Technologies are described herein for cross-language speech recognition and translation. An example method of speech recognition and translation includes receiving an input utterance in a first language, the input utterance having at least one name of a named entity included therein and being pronounced in a second language, utilizing a customized language model to process at least a portion of the input utterance, and identifying the at least one name of the named entity from the input utterance utilizing a phonetic representation of the at least one name of the named entity. The phonetic representation has a pronunciation of the at least one name in the second language.
摘要翻译: 这里描述了用于跨语言语音识别和翻译的技术。 语音识别和翻译的示例性方法包括以第一语言接收输入话语,输入话语具有包括在其中的命名实体的至少一个名称并以第二语言发音,利用定制语言模型来处理至少一个 输入话语的一部分,以及利用所述命名实体的至少一个名称的语音表示,从所述输入话语中识别所述命名实体的所述至少一个名称。 语音表示具有第二语言中至少一个名称的发音。
-
公开(公告)号:US09477652B2
公开(公告)日:2016-10-25
申请号:US14621921
申请日:2015-02-13
申请人: Facebook, Inc.
发明人: Fei Huang
CPC分类号: G10L15/063 , G06F17/274 , G06F17/275 , G06F17/279 , G06F17/28 , G10L15/005 , G10L15/26 , G10L2015/0633 , G10L2015/0636
摘要: Technology is disclosed for creating and tuning classifiers for language dialects and for generating dialect-specific language modules. A computing device can receive an initial training data set as a current training data set. The selection process for the initial training data set can be achieved by receiving one or more initial content items, establishing dialect parameters of each of the initial content items, and sorting each of the initial content items into one or more dialect groups based on the established dialect parameters. The computing device can generate, based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified. The computing device can augment the current training data set with additional training data by applying the dialect classifier to candidate content items. The computing device can then update the dialect classifier based on the augmented current training data set.
摘要翻译: 公开了用于创建和调整用于语言方言的分类器和用于生成方言特定语言模块的技术。 计算设备可以接收初始训练数据集作为当前训练数据集。 初始训练数据集的选择过程可以通过接收一个或多个初始内容项目,建立每个初始内容项目的方言参数,并且基于所建立的内容项目将每个初始内容项目分类成一个或多个方言组来实现 方言参数。 计算设备可以基于初始训练数据集生成被配置为检测要分类的内容项的语言方言的方言分类器。 计算设备可以通过将方言分类器应用于候选内容项来增加具有附加训练数据的当前训练数据集。 然后,计算设备可以基于增强的当前训练数据集来更新方言分类器。
-
公开(公告)号:US09471887B2
公开(公告)日:2016-10-18
申请号:US14871595
申请日:2015-09-30
申请人: NTT DOCOMO Inc.
发明人: Hyung Sik Shin , Ronald Sujithan , Sayandev Mukherjee , Hongfeng Yin , Yang Sun , Yoshikazu Akinaga , Pero Subasic
CPC分类号: G06N99/005 , G06F9/4881 , G06F15/18 , G06F17/30654 , G10L15/00 , G10L15/1822 , G10L2015/0633
摘要: A system and method is provided that processes a training database of human-generated requests in each of a plurality of task categories with a machine learning algorithm to develop a task classifier model that may be applied to subsequent user requests to determine the most likely one of the task categories for the subsequent user request.
摘要翻译: 提供了一种系统和方法,其利用机器学习算法处理多个任务类别中的每一个中的人产生请求的训练数据库,以开发可应用于后续用户请求的任务分类器模型,以确定最可能的一个 后续用户请求的任务类别。
-
公开(公告)号:US20160078861A1
公开(公告)日:2016-03-17
申请号:US14942349
申请日:2015-11-16
申请人: MModal IP LLC
CPC分类号: G10L15/063 , G06F17/271 , G06F17/2775 , G06F17/28 , G10L15/02 , G10L15/183 , G10L15/193 , G10L15/26 , G10L2015/0631 , G10L2015/0633
摘要: A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
-
-
-
-
-
-
-
-
-