Patent search ap:("Google Inc.") AND inv:"Kanury Kanishka Rao" Page 1

1.

发明申请
GENERATING REPRESENTATIONS OF INPUT SEQUENCES USING NEURAL NETWORKS 审中-公开
Title translation: 使用神经网络生成输入序列的表示

公开(公告)号：US20150356075A1

公开(公告)日：2015-12-10

申请号：US14728875

申请日：2015-06-02

Applicant: Google Inc.

Inventor： Kanury Kanishka Rao , Fuchun Peng , Hasim Sak , Francoise Beaufays

IPC: G06F17/28

CPC classification number: G06N3/0445

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes receiving a grapheme sequence, the grapheme sequence comprising a plurality of graphemes arranged according to an input order; processing the sequence of graphemes using a long short-term memory (LSTM) neural network to generate an initial phoneme sequence from the grapheme sequence, the initial phoneme sequence comprising a plurality of phonemes arranged according to an output order; and generating a phoneme representation of the grapheme sequence from the initial phoneme sequence generated by the LSTM neural network, wherein generating the phoneme representation comprises removing, from the initial phoneme sequence, phonemes in one or more positions in the output order.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于产生输入序列的表示。所述方法之一包括接收字母序列，所述字符序列包括根据输入顺序排列的多个字形; 使用长的短期记忆（LSTM）神经网络处理字符序列以从图形序列生成初始音素序列，所述初始音素序列包括根据输出顺序排列的多个音素; 以及从由LSTM神经网络生成的初始音素序列生成字形序列的音素表示，其中产生音素表示包括从初始音素序列去除输出顺序中的一个或多个位置中的音素。

2.

发明申请
PRONUNCIATION VERIFICATION 有权
Title translation: 授权验证

公开(公告)号：US20150161985A1

公开(公告)日：2015-06-11

申请号：US14186400

申请日：2014-02-21

Applicant: Google Inc.

Inventor： Fuchun Peng , Kanury Kanishka Rao , Francoise Beaufays

IPC: G10L15/06 , G10L15/18 , G10L15/26

CPC classification number: G10L15/063 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying pronunciations. In one aspect, a method includes obtaining a first transcription for an utterance. A second transcription for the utterance is obtained. The second transcription is different from the first transcription. One or more feature scores are determined based on the first transcription and the second transcription. The one or more feature scores are input to a trained classifier. An output of the classifier is received. The output indicates which of the first transcription and the second transcription is more likely to be a correct transcription of the utterance.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于验证发音的计算机程序。一方面，一种方法包括获得用于发音的第一转录。获得了用于说话的第二个转录。第二次转录与第一次转录不同。基于第一次转录和第二次转录确定一个或多个特征得分。将一个或多个特征得分输入到训练有素的分类器。接收分类器的输出。输出表明第一次转录和第二次转录中哪一个更可能是正确的发音转录。

3.

发明申请
LATENCY CONSTRAINTS FOR ACOUSTIC MODELING 审中-公开

公开(公告)号：US20170103752A1

公开(公告)日：2017-04-13

申请号：US14879225

申请日：2015-10-09

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/16 , G06N3/04 , G06N3/08

CPC classification number: G10L15/16 , G06N3/0445 , G06N3/0454

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic modeling of audio data. One method includes receiving audio data representing a portion of an utterance, providing the audio data to a trained recurrent neural network that has been trained to indicate the occurrence of a phone at any of multiple time frames within a maximum delay of receiving audio data corresponding to the phone, receiving, within the predetermined maximum delay of providing the audio data to the trained recurrent neural network, output of the trained neural network indicating a phone corresponding to the provided audio data using output of the trained neural network to determine a transcription for the utterance, and providing the transcription for the utterance.

4.

发明申请
MULTI-ACCENT SPEECH RECOGNITION 审中-公开

公开(公告)号：US20180053500A1

公开(公告)日：2018-02-22

申请号：US15243838

申请日：2016-08-22

Applicant: Google Inc.

Inventor： Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/16 , G10L15/02 , G10L15/06

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/187 , G10L25/30 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a hierarchical recurrent neural network (HRNN) having a plurality of parameters on a plurality of training acoustic sequences to generate phoneme representations of received acoustic sequences. One method includes, for each of the received training acoustic sequences: processing the received acoustic sequence in accordance with current values of the parameters of the HRNN to generate a predicted grapheme representation of the received acoustic sequence; processing an intermediate output generated by an intermediate layer of the HRNN during the processing of the received acoustic sequence to generate one or more predicted phoneme representations of the received acoustic sequence; and adjusting the current values of the parameters of the HRNN based at (i) the predicted grapheme representation and (ii) the one or more predicted phoneme representations.

5.

发明申请
PREDICTING PRONUNCIATIONS WITH WORD STRESS 审中-公开

公开(公告)号：US20170358293A1

公开(公告)日：2017-12-14

申请号：US15178719

申请日：2016-06-10

Applicant: Google Inc.

Inventor： Mason Vijay Chua , Kanury Kanishka Rao , Daniel Jacobus Josef van Esch

IPC: G10L13/10 , G10L15/16 , G10L25/30 , G10L15/02 , G10L15/06 , G10L15/18

CPC classification number: G10L13/10 , G10L13/0335 , G10L13/047 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/1815 , G10L15/187 , G10L17/18 , G10L25/30 , G10L2015/027

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating word pronunciations. One of the methods includes determining, by one or more computers, spelling data that indicates the spelling of a word, providing the spelling data as input to a trained recurrent neural network, the trained recurrent neural network being trained to indicate characteristics of word pronunciations based at least on data indicating the spelling of words, receiving output indicating a stress pattern for pronunciation of the word generated by the trained recurrent neural network in response to providing the spelling data as input, using the output of the trained recurrent neural network to generate pronunciation data indicating the stress pattern for a pronunciation of the word, and providing, by the one or more computers, the pronunciation data to a text-to-speech system or an automatic speech recognition system.

6.

发明授权
Verification of mappings between phoneme sequences and words 有权

公开(公告)号：US09837070B2

公开(公告)日：2017-12-05

申请号：US14186400

申请日：2014-02-21

Applicant: Google Inc.

Inventor： Fuchun Peng , Kanury Kanishka Rao , Francoise Beaufays

IPC: G10L15/06 , G10L15/26

CPC classification number: G10L15/063 , G10L15/26

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying pronunciations. In one aspect, a method includes obtaining a first transcription for an utterance. A second transcription for the utterance is obtained. The second transcription is different from the first transcription. One or more feature scores are determined based on the first transcription and the second transcription. The one or more feature scores are input to a trained classifier. An output of the classifier is received. The output indicates which of the first transcription and the second transcription is more likely to be a correct transcription of the utterance.

7.

发明授权
Generating acoustic models 有权

公开(公告)号：US09786270B2

公开(公告)日：2017-10-10

申请号：US15205263

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/06 , G10L15/16 , G10L15/187

CPC classification number: G10L15/063 , G10L15/16 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

8.

发明申请
GENERATING ACOUSTIC MODELS 有权
Title translation: 生成声学模型

公开(公告)号：US20170011738A1

公开(公告)日：2017-01-12

申请号：US15205263

申请日：2016-07-08

Applicant: Google Inc.

Inventor： Andrew W. Senior , Hasim Sak , Kanury Kanishka Rao

IPC: G10L15/06 , G10L21/06 , G10L15/34 , G10L15/16 , G10L15/26

CPC classification number: G10L15/063 , G10L15/16 , G10L15/187

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于产生声学模型的计算机程序。在一些实现中，获得了使用连接时间分类算法训练为声学模型的第一神经网络。获得来自第一神经网络的输出分布用于发音。第二神经网络被训练为使用由第一神经网络产生的输出分布作为第二神经网络的输出目标的声学模型。提供了一种被配置为使用训练有素的第二神经网络的自动语音识别器。

9.

发明申请
LEARNING PRONUNCIATIONS FROM ACOUSTIC SEQUENCES 审中-公开
Title translation: 从声学序列学习发明

公开(公告)号：US20160351188A1

公开(公告)日：2016-12-01

申请号：US14811939

申请日：2015-07-29

Applicant: Google Inc.

Inventor： Kanury Kanishka Rao , Francoise Beaufays , Hasim Sak , Ouais Alsharif

IPC: G10L15/187 , G10L15/05 , G10L15/16 , G06F17/27

CPC classification number: G10L15/187 , G06N3/0445 , G06N3/084 , G10L15/063 , G10L15/16 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the time steps processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output for the time step using a phoneme output layer to generate a phoneme representation for the acoustic feature representation for the time step; and processing the recurrent output for the time step using a grapheme output layer to generate a grapheme representation for the acoustic feature representation for the time step; and extracting, from the phoneme and grapheme representations for the acoustic feature representations at each time step, a respective pronunciation for each of one or more words.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的用于从声学序列学习发音的计算机程序。一种方法包括接收声学序列，所述声学序列包括在多个时间步长中的每一个处的相应声学特征表示; 对于通过一个或多个循环神经网络层中的每一个处理声学特征表示的每个时间步骤，以产生反复输出; 使用音素输出层处理时间步长的复现输出，以产生用于时间步长的声学特征表示的音素表示; 以及使用字形输出层处理所述时间步长的复现输出，以生成用于所述时间步长的声学特征表示的图形表示; 并且从每个时间步长处的声音特征表示的音素和图形表示中提取一个或多个单词中的每一个的相应发音。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification