-
公开(公告)号:US20240265215A1
公开(公告)日:2024-08-08
申请号:US18617428
申请日:2024-03-26
Applicant: Google LLC
Inventor: Dirk Ryan Padfield
IPC: G06F40/58 , G10L15/00 , G10L15/06 , G10L15/197 , G10L15/22
CPC classification number: G06F40/58 , G10L15/005 , G10L15/063 , G10L15/197 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that facilitate generating stable real-time textual translations in a target language of an input audio data stream that is recorded in a source language. An audio stream that is recorded in a first language is obtained. A partial transcription of the audio can be generated at each time interval in a plurality of successive time intervals. Each partial transcription can be translated into a second language that is different from the first language. Each translated partial transcription can be input to a model that determines whether a portion of an input translated partial transcription is stable. Based on the input translated partial transcription, the model identifies a portion of the translated partial transcription that is predicted to be stable. This stable portion of the translated partial transcription is provided for display on a user device.
-
公开(公告)号:US11972226B2
公开(公告)日:2024-04-30
申请号:US17269800
申请日:2020-03-23
Applicant: Google LLC
Inventor: Dirk Ryan Padfield
IPC: G06F40/58 , G10L15/00 , G10L15/06 , G10L15/197 , G10L15/22
CPC classification number: G06F40/58 , G10L15/005 , G10L15/063 , G10L15/197 , G10L15/22
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that facilitate generating stable real-time textual translations in a target language of an input audio data stream that is recorded in a source language. An audio stream that is recorded in a first language is obtained. A partial transcription of the audio can be generated at each time interval in a plurality of successive time intervals. Each partial transcription can be translated into a second language that is different from the first language. Each translated partial transcription can be input to a model that determines whether a portion of an input translated partial transcription is stable. Based on the input translated partial transcription, the model identifies a portion of the translated partial transcription that is predicted to be stable. This stable portion of the translated partial transcription is provided for display on a user device.
-
公开(公告)号:US20230360632A1
公开(公告)日:2023-11-09
申请号:US17661832
申请日:2022-05-03
Applicant: Google LLC
Inventor: Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats
Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.
-
公开(公告)号:US20250140249A1
公开(公告)日:2025-05-01
申请号:US18564884
申请日:2022-11-09
Applicant: Google LLC
Inventor: Matthew Sharifi , Jyrki Antero Alakuijala , Dirk Ryan Padfield
IPC: G10L15/22 , G10L15/183
Abstract: A method for recognizing a voice input includes receiving a first voice input including a plurality of terms, processing the first voice input based on the plurality of terms to obtain a first speech recognition result including one or more candidate terms corresponding to one or more terms from the plurality of terms, receiving a second voice input providing at least one of contextual information relating to the first voice input or confirmation information relating to the one or more candidate terms, and processing the second voice input based on the at least one of the contextual information or the confirmation information to obtain a second speech recognition result including at least one of the one or more candidate terms or one or more new candidate terms, as corresponding to the one or more terms from the plurality of terms.
-
公开(公告)号:US20240420680A1
公开(公告)日:2024-12-19
申请号:US18337168
申请日:2023-06-19
Applicant: GOOGLE LLC
Inventor: Te I , Chris Kau , Jeffrey Robert Pitman , Robert Eric Genter , Qi Ge , Wolfgang Macherey , Dirk Ryan Padfield , Naveen Arivazhagan , Colin Cherry
Abstract: Implementations relate to a multimodal translation application that can provide an abridged version of a translation through an audio interface of a computing device, while simultaneously providing a verbatim textual translation at a display interface of the computing device. The application can provide these different versions of the translation in certain circumstances when, for example, the rate of speech of a person speaking to a user is relatively high compared to a preferred rate of speech of the user. For example, a comparison between phonemes of an original language speech and a translated language speech can be performed to determine whether the ratio satisfies a threshold for providing an audible abridged translation. A determination to provide the abridged translation can additionally or alternatively be based on a determined language of the speaker.
-
公开(公告)号:US20230021824A1
公开(公告)日:2023-01-26
申请号:US17859146
申请日:2022-07-07
Applicant: Google LLC
Inventor: Dirk Ryan Padfield , Colin Andrew Cherry
Abstract: The technology provides an approach to train translation models that are robust to transcription errors and punctuation errors. The approach includes introducing errors from actual automatic speech recognition and automatic punctuation systems into the source side of the machine translation training data. A method for training a machine translation model includes performing automatic speech recognition on input source audio to generate a system transcript. The method aligns a human transcript of the source audio to the system transcript, including projecting system segmentation onto the human transcript. Then the method performs segment robustness training of a machine translation model according to the aligned human and system transcripts, and performs system robustness training of the machine translation model, e.g., by injecting token errors into training data.
-
公开(公告)号:US20250037700A1
公开(公告)日:2025-01-30
申请号:US18919366
申请日:2024-10-17
Applicant: Google LLC
Inventor: Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats
Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.
-
公开(公告)号:US12136410B2
公开(公告)日:2024-11-05
申请号:US17661832
申请日:2022-05-03
Applicant: Google LLC
Inventor: Fadi Biadsy , Dirk Ryan Padfield , Victoria Zayats
Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.
-
公开(公告)号:US20220121827A1
公开(公告)日:2022-04-21
申请号:US17269800
申请日:2020-03-23
Applicant: Google LLC
Inventor: Dirk Ryan Padfield
IPC: G06F40/58 , G10L15/00 , G10L15/22 , G10L15/06 , G10L15/197
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that facilitate generating stable real-time textual translations in a target language of an input audio data stream that is recorded in a source language. An audio stream that is recorded in a first language is obtained. A partial transcription of the audio can be generated at each time interval in a plurality of successive time intervals. Each partial transcription can be translated into a second language that is different from the first language. Each translated partial transcription can be input to a model that determines whether a portion of an input translated partial transcription is stable. Based on the input translated partial transcription, the model identifies a portion of the translated partial transcription that is predicted to be stable. This stable portion of the translated partial transcription is provided for display on a user device.
-
公开(公告)号:US20250148365A1
公开(公告)日:2025-05-08
申请号:US18835869
申请日:2022-02-03
Applicant: Google LLC
Inventor: Dirk Ryan Padfield , Matthew Sharifi
IPC: G06N20/00
Abstract: Provided are systems and methods for continuous training of machine learning (ML) models on changing data. In particular, the present disclosure provides example approaches to model training that take advantage of constantly evolving data that may be available in various ancillary systems that contain large amounts of data, but which are not specific to or dedicated for model training.
-
-
-
-
-
-
-
-
-