Emitting word timings with end-to-end models

    公开(公告)号:US12027154B2

    公开(公告)日:2024-07-02

    申请号:US18167050

    申请日:2023-02-09

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L25/30 G10L25/78

    Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

    Privacy-aware meeting room transcription from audio-visual stream

    公开(公告)号:US12118123B2

    公开(公告)日:2024-10-15

    申请号:US17755892

    申请日:2019-11-18

    Applicant: Google LLC

    CPC classification number: G06F21/6254 G10L17/02 H04L12/1831

    Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

    PRIVACY-AWARE MEETING ROOM TRANSCRIPTION FROM AUDIO-VISUAL STREAM

    公开(公告)号:US20240104247A1

    公开(公告)日:2024-03-28

    申请号:US18535214

    申请日:2023-12-11

    Applicant: Google LLC

    CPC classification number: G06F21/6254 G10L17/02 H04L12/1831

    Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

    Emitting Word Timings with End-to-End Models

    公开(公告)号:US20240321263A1

    公开(公告)日:2024-09-26

    申请号:US18680797

    申请日:2024-05-31

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L25/30 G10L25/78

    Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

    Emitting Word Timings with End-to-End Models

    公开(公告)号:US20230206907A1

    公开(公告)日:2023-06-29

    申请号:US18167050

    申请日:2023-02-09

    Applicant: Google LLC

    CPC classification number: G10L15/063 G10L25/30 G10L25/78

    Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Patent Agency Ranking