Patent search ap:("Google LLC") AND inv:"Richard Rose" Page 1

1.

发明授权
Adaptive multichannel dereverberation for automatic speech recognition 有权

公开(公告)号：US11699453B2

公开(公告)日：2023-07-11

申请号：US17005823

申请日：2020-08-28

Applicant: Google LLC

Inventor： Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose

IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216

CPC classification number: G10L21/0208 , G06F3/167 , G06F17/142 , G06N3/02 , G10L15/063 , G10L15/065 , G10L15/20 , G10L15/22 , G10L2015/223 , G10L2021/02082 , G10L2021/02166

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

2.

发明申请
Rescoring Automatic Speech Recognition Hypotheses Using Audio-Visual Matching 有权

公开(公告)号：US20220392439A1

公开(公告)日：2022-12-08

申请号：US17755972

申请日：2019-11-18

Applicant: Google LLC

Inventor： Olivier Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basillo Garcia Castillo

IPC: G10L15/08 , G10L13/02 , G10L15/25 , G06V20/40 , G06V40/16 , G10L15/06 , G06V10/774 , G10L15/22 , G10L15/30 , G10L25/57

Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

3.

发明授权
Privacy-aware meeting room transcription from audio-visual stream 有权

公开(公告)号：US12118123B2

公开(公告)日：2024-10-15

申请号：US17755892

申请日：2019-11-18

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

4.

发明公开
PRIVACY-AWARE MEETING ROOM TRANSCRIPTION FROM AUDIO-VISUAL STREAM 审中-公开

公开(公告)号：US20240104247A1

公开(公告)日：2024-03-28

申请号：US18535214

申请日：2023-12-11

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Garcia Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

CPC classification number: G06F21/6254 , G10L17/02 , H04L12/1831

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

5.

发明申请
Privacy-Aware Meeting Room Transcription from Audio-Visual Stream 有权

公开(公告)号：US20220382907A1

公开(公告)日：2022-12-01

申请号：US17755892

申请日：2019-11-18

Applicant: Google LLC

Inventor： Oliver Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basilio Castillo

IPC: G06F21/62 , G10L17/02 , H04L12/18

Abstract: A method for a privacy-aware transcription includes receiving audio-visual signal including audio data and image data for a speech environment and a privacy request from a participant in the speech environment where the privacy request indicates a privacy condition of the participant. The method further includes segmenting the audio data into a plurality of segments. For each segment, the method includes determining an identity of a speaker of a corresponding segment of the audio data based on the image data and determining whether the identity of the speaker of the corresponding segment includes the participant associated with the privacy condition. When the identity of the speaker of the corresponding segment includes the participant, the method includes applying the privacy condition to the corresponding segment. The method also includes processing the plurality of segments of the audio data to determine a transcript for the audio data.

6.

发明授权
Adaptive multichannel dereverberation for automatic speech recognition 有权

公开(公告)号：US10762914B2

公开(公告)日：2020-09-01

申请号：US16032996

申请日：2018-07-11

Applicant: Google LLC

Inventor： Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose

IPC: G10L21/00 , G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G06F3/16 , G06N3/02 , G06F17/14 , G10L15/06 , G10L21/0216

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

7.

发明申请
ADAPTIVE MULTICHANNEL DEREVERBERATION FOR AUTOMATIC SPEECH RECOGNITION 审中-公开

公开(公告)号：US20190272840A1

公开(公告)日：2019-09-05

申请号：US16032996

申请日：2018-07-11

Applicant: Google LLC

Inventor： Joseph Caroselli , Arun Narayanan , Izhak Shafran , Richard Rose

IPC: G10L21/0208 , G10L15/20 , G10L15/22 , G10L15/065 , G10L15/06 , G06F3/16 , G06N3/02 , G06F17/14

Abstract: Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification