-
公开(公告)号:US12027154B2
公开(公告)日:2024-07-02
申请号:US18167050
申请日:2023-02-09
Applicant: Google LLC
Inventor: Tara N. Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang
CPC classification number: G10L15/063 , G10L25/30 , G10L25/78
Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.
-
公开(公告)号:US12118123B2
公开(公告)日:2024-10-15
申请号:US17755892
申请日:2019-11-18
Applicant: Google LLC
Inventor: Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo
CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831
Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.
-
公开(公告)号:US20240104247A1
公开(公告)日:2024-03-28
申请号:US18535214
申请日:2023-12-11
Applicant: Google LLC
Inventor: Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo
CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831
Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.
-
公开(公告)号:US20240321263A1
公开(公告)日:2024-09-26
申请号:US18680797
申请日:2024-05-31
Applicant: Google LLC
Inventor: Tara N. Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang
CPC classification number: G10L15/063 , G10L25/30 , G10L25/78
Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.
-
公开(公告)号:US20230206907A1
公开(公告)日:2023-06-29
申请号:US18167050
申请日:2023-02-09
Applicant: Google LLC
Inventor: Tara N Sainath , Basilio Garcia Castillo , David Rybach , Trevor Strohman , Ruoming Pang
CPC classification number: G10L15/063 , G10L25/30 , G10L25/78
Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.
-
-
-
-