-
公开(公告)号:US11699453B2
公开(公告)日:2023-07-11
申请号:US17005823
申请日:2020-08-28
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216
CPC classification number: G10L21/0208 , G06F3/167 , G06F17/142 , G06N3/02 , G10L15/063 , G10L15/065 , G10L15/20 , G10L15/22 , G10L2015/223 , G10L2021/02082 , G10L2021/02166
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
公开(公告)号:US20220392439A1
公开(公告)日:2022-12-08
申请号:US17755972
申请日:2019-11-18
Applicant: Google LLC
Inventor: Olivier Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basillo Garcia Castillo
IPC: G10L15/08 , G10L13/02 , G10L15/25 , G06V20/40 , G06V40/16 , G10L15/06 , G06V10/774 , G10L15/22 , G10L15/30 , G10L25/57
Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.
-
公开(公告)号:US12118123B2
公开(公告)日:2024-10-15
申请号:US17755892
申请日:2019-11-18
Applicant: Google LLC
Inventor: Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo
CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831
Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.
-
公开(公告)号:US20240104247A1
公开(公告)日:2024-03-28
申请号:US18535214
申请日:2023-12-11
Applicant: Google LLC
Inventor: Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo
CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831
Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.
-
公开(公告)号:US20220382907A1
公开(公告)日:2022-12-01
申请号:US17755892
申请日:2019-11-18
Applicant: Google LLC
Inventor: Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Castillo
Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.
-
公开(公告)号:US10762914B2
公开(公告)日:2020-09-01
申请号:US16032996
申请日:2018-07-11
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
公开(公告)号:US20190272840A1
公开(公告)日:2019-09-05
申请号:US16032996
申请日:2018-07-11
Applicant: Google LLC
Inventor: Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose
IPC: G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G10L15/06 , G06F3/16 , G06N3/02 , G06F17/14
Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
-
-
-
-
-
-