Patent search ap:("Google LLC") AND inv:"Om Thakkar" Page 1

1.

发明公开
Phrase Extraction for ASR Models 审中-公开

公开(公告)号：US20230178094A1

公开(公告)日：2023-06-08

申请号：US17643848

申请日：2021-12-13

Applicant: Google LLC

Inventor： Ehsan Amid , Om Thakkar , Rajiv Mathews , Francoise Beaufays

IPC: G10L21/0332 , G10L21/10 , G10L15/06 , G10L15/08

CPC classification number: G10L21/0332 , G10L21/10 , G10L15/063 , G10L15/08

Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

2.

发明公开
Detecting Unintended Memorization in Language-Model-Fused ASR Systems 审中-公开

公开(公告)号：US20230335126A1

公开(公告)日：2023-10-19

申请号：US18303296

申请日：2023-04-19

Applicant: Google LLC

Inventor： Ronny Huang , Steve Chien , Om Thakkar , Rajiv Mathews

IPC: G10L15/197 , G10L13/02 , G10L15/01 , G10L15/06 , G10L15/16

CPC classification number: G10L15/197 , G10L13/02 , G10L15/01 , G10L15/063 , G10L15/16

Abstract: A method includes inserting a set of canary text samples into a corpus of training text samples and training an external language model on the corpus of training text samples and the set of canary text samples inserted into the corpus of training text samples. For each canary text sample, the method also includes generating a corresponding synthetic speech utterance and generating an initial transcription for the corresponding synthetic speech utterance. The method also includes rescoring the initial transcription generated for each corresponding synthetic speech utterance using the external language model. The method also includes determining a word error rate (WER) of the external language model based on the rescored initial transcriptions and the canary text samples and detecting memorization of the canary text samples by the external language model based on the WER of the external language model.

3.

发明授权
Phrase extraction for ASR models 有权

公开(公告)号：US11955134B2

公开(公告)日：2024-04-09

申请号：US17643848

申请日：2021-12-13

Applicant: Google LLC

Inventor： Ehsan Amid , Om Thakkar , Rajiv Mathews , Francoise Beaufays

IPC: G10L21/0332 , G10L15/06 , G10L15/08 , G10L21/10

CPC classification number: G10L21/0332 , G10L15/063 , G10L15/08 , G10L21/10

Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

4.

发明公开
SERVER EFFICIENT ENHANCEMENT OF PRIVACY IN FEDERATED LEARNING 审中-公开

公开(公告)号：US20230223028A1

公开(公告)日：2023-07-13

申请号：US18007656

申请日：2020-10-16

Applicant: GOOGLE LLC

Inventor： Om Thakkar , Abhradeep Guha Thakurta , Peter Kairouz , Borja de Balle Pigem , Brendan McMahan

IPC: G10L15/30 , G10L15/06

CPC classification number: G10L15/30 , G10L15/063

Abstract: Techniques are disclosed that enable training a global model using gradients provided to a remote system by a set of client devices during a reporting window, where each client device randomly determines a reporting time in the reporting window to provide the gradient to the remote system. Various implementations include each client device determining a corresponding gradient by processing data using a local model stored locally at the client device, where the local model corresponds to the global model.

Patent Agency Ranking