LATENCY CONSTRAINTS FOR ACOUSTIC MODELING
    31.
    发明申请

    公开(公告)号:US20170103752A1

    公开(公告)日:2017-04-13

    申请号:US14879225

    申请日:2015-10-09

    Applicant: Google Inc.

    CPC classification number: G10L15/16 G06N3/0445 G06N3/0454

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic modeling of audio data. One method includes receiving audio data representing a portion of an utterance, providing the audio data to a trained recurrent neural network that has been trained to indicate the occurrence of a phone at any of multiple time frames within a maximum delay of receiving audio data corresponding to the phone, receiving, within the predetermined maximum delay of providing the audio data to the trained recurrent neural network, output of the trained neural network indicating a phone corresponding to the provided audio data using output of the trained neural network to determine a transcription for the utterance, and providing the transcription for the utterance.

    FREQUENCY WARPING IN A SPEECH RECOGNITION SYSTEM
    32.
    发明申请
    FREQUENCY WARPING IN A SPEECH RECOGNITION SYSTEM 有权
    语音识别系统中的频率波动

    公开(公告)号:US20170032802A1

    公开(公告)日:2017-02-02

    申请号:US15221491

    申请日:2016-07-27

    Applicant: Google Inc.

    Inventor: Andrew W. Senior

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a sequence representing an utterance, the sequence comprising a plurality of audio frames; determining one or more warping factors for each audio frame in the sequence using a warping neural network; applying, for each audio frame, the one or more warping factors for the audio frame to the audio frame to generate a respective modified audio frame, wherein the applying comprises using at least one of the warping factors to scale a respective frequency of the audio frame to a new respective frequency in the respective modified audio frame; and decoding the modified audio frames using a decoding neural network, wherein the decoding neural network is configured to output a word sequence that is a transcription of the utterance.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于接收表示话语的序列,该序列包括多个音频帧; 使用翘曲神经网络确定序列中的每个音频帧的一个或多个翘曲因子; 将音频帧的一个或多个翘曲因子应用于音频帧以生成相应的修改音频帧,其中所述应用包括使用至少一个扭曲因子来缩放音频帧的相应频率 到相应修改的音频帧中的新的相应频率; 以及使用解码神经网络解码所述修改的音频帧,其中所述解码神经网络被配置为输出作为所述话语的转录的单词序列。

    CONTEXT-DEPENDENT MODELING OF PHONEMES
    33.
    发明申请
    CONTEXT-DEPENDENT MODELING OF PHONEMES 有权
    语音相关依赖建模

    公开(公告)号:US20160372118A1

    公开(公告)日:2016-12-22

    申请号:US14877673

    申请日:2015-10-07

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

    Abstract translation: 方法,系统和装置,包括在用于建模音素的计算机存储介质上编码的计算机程序。 一种方法包括:在多个时间步骤的每个步骤处接收声学序列,表示话语的声学序列,以及包括相应的声学特征表示的声学序列; 对于所述多个时间步骤中的每个步骤:通过一个或多个循环神经网络层中的每一个处理所述声学特征表示以产生复现输出; 使用softmax输出层处理复现输出以产生一组分数,该分数集合包括多个上下文相关词汇表音素中的每一个的相应分数,每个上下文相关词汇语音的分数表示上下文相关的可能性 词汇音素代表时间步长的话语; 以及从所述多个时间步长的得分确定所述序列的上下文相关音素表示。

    CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS
    34.
    发明申请
    CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS 审中-公开
    连续长时间的记忆,完全连接的深层神经网络

    公开(公告)号:US20160099010A1

    公开(公告)日:2016-04-07

    申请号:US14847133

    申请日:2015-09-08

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于识别口语发音的语言。 其中一种方法包括接收话音的输入特征; 以及使用包括一个或多个卷积神经网络(CNN)层,一个或多个长短期存储网络(LSTM)层和一个或多个完全连接的神经网络层的声学模型来处理输入特征,以产生用于 说话。

    CLUSTER SPECIFIC SPEECH MODEL
    35.
    发明申请
    CLUSTER SPECIFIC SPEECH MODEL 有权
    集群特定语音模型

    公开(公告)号:US20150269931A1

    公开(公告)日:2015-09-24

    申请号:US14663610

    申请日:2015-03-20

    Applicant: Google Inc.

    CPC classification number: G10L15/063 G10L15/183 G10L2015/0631

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, where each cluster includes a plurality of vectors, and where each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于接收表示用户声音的声学特性的数据; 从多个聚类中选择用于数据的聚类,其中每个聚类包括多个向量,并且其中每个聚类与使用所述多个向量的至少一个或多个向量的由神经网络训练的语音模型相关联 各集群; 并且响应于接收到所述用户的一个或多个话语,提供与所述群集相关联的语音模型以用于转录所述一个或多个话语

Patent Agency Ranking