-
公开(公告)号:US09460711B1
公开(公告)日:2016-10-04
申请号:US13862541
申请日:2013-04-15
Applicant: Google Inc.
Inventor: Vincent Olivier Vanhoucke , Jeffrey Adgate Dean , Georg Heigold , Marc'aurelio Ranzato , Matthieu Devin , Patrick An Phu Nguyen , Andrew William Senior
CPC classification number: G10L15/16 , G10L15/063 , G10L15/144
Abstract: Methods and systems for processing multilingual DNN acoustic models are described. An example method may include receiving training data that includes a respective training data set for each of two or more or languages. A multilingual deep neural network (DNN) acoustic model may be processed based on the training data. The multilingual DNN acoustic model may include a feedforward neural network having multiple layers of one or more nodes. Each node of a given layer may connect with a respective weight to each node of a subsequent layer, and the multiple layers of one or more nodes may include one or more shared hidden layers of nodes and a language-specific output layer of nodes corresponding to each of the two or more languages. Additionally, weights associated with the multiple layers of one or more nodes of the processed multilingual DNN acoustic model may be stored in a database.
Abstract translation: 描述了处理多语言DNN声学模型的方法和系统。 示例性方法可以包括接收包括用于两种或多种或多种语言中的每一种的相应训练数据集的训练数据。 可以基于训练数据处理多语言深层神经网络(DNN)声学模型。 多语言DNN声学模型可以包括具有一个或多个节点的多个层的前馈神经网络。 给定层的每个节点可以将相应权重连接到后续层的每个节点,并且一个或多个节点的多个层可以包括节点的一个或多个共享隐藏层和对应于节点的语言特定输出层 每种两种或多种语言。 另外,与经处理的多语言DNN声学模型的一个或多个节点的多个层相关联的权重可以存储在数据库中。
-
公开(公告)号:US09378733B1
公开(公告)日:2016-06-28
申请号:US13860982
申请日:2013-04-11
Applicant: Google Inc.
Inventor: Vincent O. Vanhoucke , Oriol Vinyals , Patrick An Phu Nguyen , Maria Carolina Parada San Martin , Johan Schalkwyk
CPC classification number: G10L15/08 , G10L15/02 , G10L2015/088
Abstract: Embodiments pertain to automatic speech recognition in mobile devices to establish the presence of a keyword. An audio waveform is received at a mobile device. Front-end feature extraction is performed on the audio waveform, followed by acoustic modeling, high level feature extraction, and output classification to detect the keyword. Acoustic modeling may use a neural network or a vector quantization dictionary and high level feature extraction may use pooling.
Abstract translation: 实施例涉及移动设备中的自动语音识别以建立关键字的存在。 在移动设备处接收音频波形。 对音频波形执行前端特征提取,然后进行声学建模,高级特征提取和输出分类,以检测关键字。 声学建模可以使用神经网络或矢量量化字典,并且高级特征提取可以使用池。
-
公开(公告)号:US20140278379A1
公开(公告)日:2014-09-18
申请号:US13863505
申请日:2013-04-16
Applicant: Google Inc.
Inventor: Noah B. Coccaro , Patrick An Phu Nguyen
IPC: G10L15/16
CPC classification number: G10L15/1815 , G10L15/1822 , G10L25/30
Abstract: In one implementation, a computer-implemented method includes receiving, at a computer system, a request to predict a next word in a dialog being uttered by a speaker; accessing, by the computer system, a neural network comprising i) an input layer, ii) one or more hidden layers, and iii) an output layer; identifying the local context for the dialog of the speaker; selecting, by the computer system and using a semantic model, at least one vector that represents the semantic context for the dialog; applying input to the input layer of the neural network, the input comprising i) the local context of the dialog and ii) the values for the at least one vector; generating probability values for at least a portion of the candidate words; and providing, by the computer system and based on the probability values, information that identifies one or more of the candidate words.
Abstract translation: 在一个实现中,计算机实现的方法包括在计算机系统处接收用于预测由扬声器发出的对话中的下一个字的请求; 由计算机系统访问包括i)输入层,ii)一个或多个隐藏层,以及iii)输出层的神经网络; 识别演讲者对话的本地语境; 由计算机系统和使用语义模型选择表示对话的语义上下文的至少一个向量; 将输入应用于所述神经网络的输入层,所述输入包括i)所述对话的本地上下文,以及ii)所述至少一个向量的值; 为所述候选词的至少一部分生成概率值; 以及通过计算机系统并基于概率值提供标识一个或多个候选词的信息。
-
公开(公告)号:US08775177B1
公开(公告)日:2014-07-08
申请号:US13665245
申请日:2012-10-31
Applicant: Google Inc.
Inventor: Georg Heigold , Patrick An Phu Nguyen , Mitchel Weintraub , Vincent O. Vanhoucke
IPC: G10L15/06
CPC classification number: G10L15/10 , G10L2015/085
Abstract: A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.
Abstract translation: 语音识别处理可以执行以下操作:对第一音频执行初步识别处理以识别第一音频的候选; 生成与第一音频相对应的第一模板,其中每个第一模板包括多个元素; 选择与候选对应的第二模板,其中第二模板表示第二音频,并且其中每个第二模板包括与第一模板中的元素相对应的元素; 将第一模板与第二模板进行比较,其中比较包括第一模板与对应的第二模板之间的相似性度量; 对所述相似性度量应用权重以产生加权相似性度量,其中所述权重与相应的第二模板相关联; 以及使用所述加权相似性度量来确定所述第一音频是否对应于所述第二音频。
-
公开(公告)号:US20150279351A1
公开(公告)日:2015-10-01
申请号:US13861020
申请日:2013-04-11
Applicant: Google Inc.
IPC: G10L15/02
CPC classification number: G10L15/08 , G10L15/02 , G10L2015/088
Abstract: Embodiments pertain to automatic speech recognition in mobile devices to establish the presence of a keyword. An audio waveform is received at a mobile device. Front-end feature extraction is performed on the audio waveform, followed by acoustic modeling, high level feature extraction, and output classification to detect the keyword. Acoustic modeling may use a neural network or Gaussian mixture modeling, and high level feature extraction may be done by aligning the results of the acoustic modeling with expected event vectors that correspond to a keyword.
Abstract translation: 实施例涉及移动设备中的自动语音识别以建立关键字的存在。 在移动设备处接收音频波形。 对音频波形执行前端特征提取,然后进行声学建模,高级特征提取和输出分类,以检测关键字。 声学建模可以使用神经网络或高斯混合建模,并且可以通过将声学建模的结果与对应于关键字的预期事件向量对齐来完成高级特征提取。
-
-
-
-