LOW-RANK HIDDEN INPUT LAYER FOR SPEECH RECOGNITION NEURAL NETWORK
    1.
    发明申请
    LOW-RANK HIDDEN INPUT LAYER FOR SPEECH RECOGNITION NEURAL NETWORK 有权
    低位隐藏输入层用于语音识别神经网络

    公开(公告)号:US20160092766A1

    公开(公告)日:2016-03-31

    申请号:US14616881

    申请日:2015-02-09

    Applicant: Google Inc.

    CPC classification number: G10L25/30 G06N3/0454 G06N3/0481 G10L15/063

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods for training a deep neural network that includes a low rank hidden input layer and an adjoining hidden layer, the low rank hidden input layer including a first matrix A and a second matrix B with dimensions i×m and m×o, respectively, to identify a keyword includes receiving a feature vector including i values that represent features of an audio signal encoding an utterance, determining, using the low rank hidden input layer, an output vector including o values using the feature vector, determining, using the adjoining hidden layer, another vector using the output vector, determining a confidence score that indicates whether the utterance includes the keyword using the other vector, and adjusting weights for the low rank hidden input layer using the confidence score.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练深层神经网络。 训练深层神经网络的方法之一包括低级隐含输入层和邻接隐层,低级隐含输入层包括第一矩阵A和尺寸为i×m和m×o的第二矩阵B, 分别用于识别关键字包括接收包括表示编码话语的音频信号的特征的i值的特征向量,使用所述特征向量来确定使用所述低级隐藏输入层的包括o值的输出向量, 使用输出向量的另一向量,确定指示该话语是否包括使用另一向量的关键词的置信度分数,以及使用置信度分数来调整低级隐藏输入层的权重。

    Transfer learning for deep neural network based hotword detection

    公开(公告)号:US09715660B2

    公开(公告)日:2017-07-25

    申请号:US14230225

    申请日:2014-03-31

    Applicant: Google Inc.

    CPC classification number: G06N7/005 G06N3/0454 G10L15/16 G10L2015/088

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.

    SPEAKER RECOGNITION USING NEURAL NETWORKS
    4.
    发明申请
    SPEAKER RECOGNITION USING NEURAL NETWORKS 审中-公开
    使用神经网络的扬声器识别

    公开(公告)号:US20160293167A1

    公开(公告)日:2016-10-06

    申请号:US15179717

    申请日:2016-06-10

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker verification. In one aspect, a method includes accessing a neural network having an input layer that provides inputs to a first hidden layer whose nodes are respectively connected to only a proper subset of the inputs from the input layer. Speech data that corresponds to a particular utterance may be provided as input to the input layer of the neural network. A representation of activations that occur in response to the speech data at a particular layer of the neural network that was configured as a hidden layer during training of the neural network may be generated. A determination of whether the particular utterance was likely spoken by a particular speaker may be made based at least on the generated representation. An indication of whether the particular utterance was likely spoken by the particular speaker may be provided.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话者验证的计算机程序。 一方面,一种方法包括访问具有输入层的神经网络,所述输入层向第一隐藏层提供输入,所述第一隐藏层的节点仅分别连接到来自输入层的输入的适当子集。 可以将对应于特定话语的语音数据提供给神经网络的输入层的输入。 可以生成在神经网络的训练期间被配置为隐藏层的神经网络的特定层响应于语音数据而发生的激活的表示。 可以至少基于所生成的表示来确定特定说话者是否可能说出特定话语的确定。 可以提供特定说话者是否可能说出特定话语的指示。

    CONVOLUTIONAL NEURAL NETWORKS
    5.
    发明申请
    CONVOLUTIONAL NEURAL NETWORKS 审中-公开
    CONVOLUTIONAL神经网络

    公开(公告)号:US20160283841A1

    公开(公告)日:2016-09-29

    申请号:US14805704

    申请日:2015-07-22

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for keyword spotting. One of the methods includes training, by a keyword detection system, a convolutional neural network for keyword detection by providing a two-dimensional set of input values to the convolutional neural network, the input values including a first dimension in time and a second dimension in frequency, and performing convolutional multiplication on the two-dimensional set of input values for a filter using a frequency stride greater than one to generate a feature map.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于关键字识别。 方法之一包括通过关键字检测系统对卷积神经网络提供二维输入值集合来进行关键词检测的卷积神经网络,所述输入值包括时间上的第一维度和第二维度 频率和对使用大于1的频率步幅的滤波器的二维输入值集进行卷积乘法以生成特征图。

    Voice Activity Detection
    6.
    发明申请

    公开(公告)号:US20170092297A1

    公开(公告)日:2017-03-30

    申请号:US14986985

    申请日:2016-01-04

    Applicant: Google Inc.

    CPC classification number: G10L25/78 G10L25/30

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting voice activity. In one aspect, a method include actions of receiving, by a neural network included in an automated voice activity detection system, a raw audio waveform, processing, by the neural network, the raw audio waveform to determine whether the audio waveform includes speech, and provide, by the neural network, a classification of the raw audio waveform indicating whether the raw audio waveform includes speech.

    Keyword detection without decoding
    7.
    发明授权
    Keyword detection without decoding 有权
    关键字检测无需解码

    公开(公告)号:US09378733B1

    公开(公告)日:2016-06-28

    申请号:US13860982

    申请日:2013-04-11

    Applicant: Google Inc.

    CPC classification number: G10L15/08 G10L15/02 G10L2015/088

    Abstract: Embodiments pertain to automatic speech recognition in mobile devices to establish the presence of a keyword. An audio waveform is received at a mobile device. Front-end feature extraction is performed on the audio waveform, followed by acoustic modeling, high level feature extraction, and output classification to detect the keyword. Acoustic modeling may use a neural network or a vector quantization dictionary and high level feature extraction may use pooling.

    Abstract translation: 实施例涉及移动设备中的自动语音识别以建立关键字的存在。 在移动设备处接收音频波形。 对音频波形执行前端特征提取,然后进行声学建模,高级特征提取和输出分类,以检测关键字。 声学建模可以使用神经网络或矢量量化字典,并且高级特征提取可以使用池。

    DETERMINING HOTWORD SUITABILITY
    8.
    发明申请
    DETERMINING HOTWORD SUITABILITY 有权
    确定热门适用性

    公开(公告)号:US20160133259A1

    公开(公告)日:2016-05-12

    申请号:US15002044

    申请日:2016-01-20

    Applicant: Google Inc

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于确定热词适用性。 一方面,一种方法包括接收语音数据,该语音数据编码由用户说出的候选词条,使用一个或多个预定标准评估语音数据或候选词条的转录,基于 使用一个或多个预定标准来评估语音数据或候选词条的转录,以及提供用于显示给用户的热词适合性得分的表示。

    TRANSFER LEARNING FOR DEEP NEURAL NETWORK BASED HOTWORD DETECTION
    9.
    发明申请
    TRANSFER LEARNING FOR DEEP NEURAL NETWORK BASED HOTWORD DETECTION 有权
    基于深层神经网络的传输学习方法

    公开(公告)号:US20150127594A1

    公开(公告)日:2015-05-07

    申请号:US14230225

    申请日:2014-03-31

    Applicant: GOOGLE INC.

    CPC classification number: G06N7/005 G06N3/0454 G10L15/16 G10L2015/088

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练深层神经网络。 其中一种方法包括通过调整包含在神经网络中的多个权重中的每一个的值来训练具有第一训练集的深神经网络,以及训练深层神经网络以确定由深层神经网络接收的数据的概率 特征类似于一个或多个关键词或关键短语的关键特征,所述训练包括向所述深层神经网络提供第二训练集并且调整所述多个权重的第一子集的值,其中所述第二训练集包括表示 一个或多个关键字或关键短语的主要功能。

    KEY PHRASE DETECTION
    10.
    发明申请
    KEY PHRASE DETECTION 有权
    关键相位检测

    公开(公告)号:US20150095027A1

    公开(公告)日:2015-04-02

    申请号:US14041131

    申请日:2013-09-30

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于密钥短语检测的计算机程序。 其中一种方法包括接收多个音频帧向量,每个音频帧向量在不同的时间段内对音频波形进行建模,为每个音频帧向量生成输出特征向量,其中每个输出特征向量包括表征的一组分数 相应的音频帧向量与一组预期事件向量之间的声匹配,每个预期事件向量对应于分数中的一个,并定义关键字的至少一部分的声学属性,并提供每个输出特征向量 到后处理模块。

Patent Agency Ranking