EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL

    公开(公告)号:US20240371363A1

    公开(公告)日:2024-11-07

    申请号:US18772263

    申请日:2024-07-15

    Applicant: Google LLC

    Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

    STFT-based echo muter
    35.
    发明授权

    公开(公告)号:US12051434B2

    公开(公告)日:2024-07-30

    申请号:US17643825

    申请日:2021-12-11

    Applicant: Google LLC

    CPC classification number: G10L21/0224 G10L21/0232 G10L2021/02082

    Abstract: A method for Short-Time Fourier Transform-based echo muting includes receiving a microphone signal including acoustic echo captured by a microphone and corresponding to audio content from an acoustic speaker, and receiving a reference signal including a sequence of frames representing the audio content. For each frame in a sequence of frames, the method includes processing, using an acoustic echo canceler configured to receive a respective frame as input to generate a respective output signal frame that cancels the acoustic echo from the respective frame, and determining, using a Double-talk Detector (DTD), based on the respective frame and the respective output signal frame, whether the respective frame includes a double-talk frame or an echo-only frame. For each respective frame that includes the echo-only frame, muting the respective output signal frame, and performing speech processing on the respective output signal frame for each respective frame that includes the double-talk frame.

    Efficient Streaming Non-Recurrent On-Device End-to-End Model

    公开(公告)号:US20220310062A1

    公开(公告)日:2022-09-29

    申请号:US17316198

    申请日:2021-05-10

    Applicant: Google LLC

    Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Patent Agency Ranking