-
公开(公告)号:US09401148B2
公开(公告)日:2016-07-26
申请号:US14228469
申请日:2014-03-28
Applicant: Google Inc.
Inventor: Xin Lei , Erik McDermott , Ehsan Variani , Ignacio L. Moreno
CPC classification number: G10L17/18
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于将对应于特定话语的语音数据输入到神经网络; 基于所述神经网络的隐藏层的输出确定评估向量; 将评估向量与对应于特定说话者的过去发音的参考向量进行比较; 并且基于比较评估向量和参考向量,确定特定发音是否可能由特定说话者说出。
-
公开(公告)号:US20180068675A1
公开(公告)日:2018-03-08
申请号:US15350293
申请日:2016-11-14
Applicant: Google Inc.
Inventor: Ehsan Variani , Kevin William Wilson , Ron J. Weiss , Tara N. Sainath , Arun Narayanan
IPC: G10L25/30 , G10L21/028 , G10L21/0388
CPC classification number: G10L25/30 , G10L15/16 , G10L15/20 , G10L19/008 , G10L21/028 , G10L21/0388 , G10L2021/02087 , G10L2021/02166
Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
-
公开(公告)号:US20150127336A1
公开(公告)日:2015-05-07
申请号:US14228469
申请日:2014-03-28
Applicant: Google Inc.
Inventor: Xin Lei , Erik McDermott , Ehsan Variani , Ignacio L. Moreno
IPC: G10L17/18
CPC classification number: G10L17/18
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于将对应于特定话语的语音数据输入到神经网络; 基于所述神经网络的隐藏层的输出确定评估向量; 将评估向量与对应于特定说话者的过去发音的参考向量进行比较; 并且基于比较评估向量和参考向量,确定特定发音是否可能由特定说话者说出。
-
-