-
公开(公告)号:US10699714B2
公开(公告)日:2020-06-30
申请号:US16041434
申请日:2018-07-20
Applicant: Google LLC
Inventor: Brian Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
-
公开(公告)号:US11527248B2
公开(公告)日:2022-12-13
申请号:US16885116
申请日:2020-05-27
Applicant: Google LLC
Inventor: Brian Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
-
公开(公告)号:US20220392439A1
公开(公告)日:2022-12-08
申请号:US17755972
申请日:2019-11-18
Applicant: Google LLC
Inventor: Olivier Siohan , Takaki Makino , Richard Rose , Otavio Braga , Hank Liao , Basillo Garcia Castillo
IPC: G10L15/08 , G10L13/02 , G10L15/25 , G06V20/40 , G06V40/16 , G10L15/06 , G06V10/774 , G10L15/22 , G10L15/30 , G10L25/57
Abstract: A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.
-
公开(公告)号:US10204619B2
公开(公告)日:2019-02-12
申请号:US15049892
申请日:2016-02-22
Applicant: Google LLC
Inventor: Olivier Siohan , Pedro J. Moreno Mengibar
IPC: G10L15/26 , G10L21/0308 , G10L15/02 , G10L15/10 , G10L15/20
Abstract: Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.
-
5.
公开(公告)号:US20230267922A1
公开(公告)日:2023-08-24
申请号:US17678657
申请日:2022-02-23
Applicant: GOOGLE LLC
Inventor: Olivier Siohan , Takaki Makino , Joshua Maynez , Ryan Mcdonald , Benyah Shaparenko , Joseph Nelson , Kishan Sachdeva , Basilio Garcia
IPC: G10L15/18 , G06F40/279 , H04L12/18
CPC classification number: G10L15/1815 , G06F40/279 , H04L12/1831 , H04L12/1818
Abstract: Implementations relate to an application that can bias automatic speech recognition for meetings using data that may be associated with the meeting and/or meeting participants. A transcription of inputs provided during a meeting can additionally and/or alternatively be processed to determine whether the inputs should be incorporated into a meeting document, which can provide a summary for the meeting. In some instances, entries into a meeting document can be designated as action items, and those action items can optionally have conditions for reminding meeting participants about the action items and/or for determining whether an action item has been fulfilled. In this way, various tasks that may typically be manually performed by meeting participants, such as creating a meeting summary, can be automated in a more accurate manner. This can preserve resources that may otherwise be wasted during video conferences, in-person meetings, and/or other gatherings.
-
公开(公告)号:US10049672B2
公开(公告)日:2018-08-14
申请号:US15171374
申请日:2016-06-02
Applicant: Google LLC
Inventor: Brian Patrick Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
-
7.
公开(公告)号:US12199783B2
公开(公告)日:2025-01-14
申请号:US17678657
申请日:2022-02-23
Applicant: GOOGLE LLC
Inventor: Olivier Siohan , Takaki Makino , Joshua Maynez , Ryan Mcdonald , Benyah Shaparenko , Joseph Nelson , Kishan Sachdeva , Basilio Garcia
IPC: H04L12/18 , G06F40/279 , G10L15/18
Abstract: Implementations relate to an application that can bias automatic speech recognition for meetings using data that may be associated with the meeting and/or meeting participants. A transcription of inputs provided during a meeting can additionally and/or alternatively be processed to determine whether the inputs should be incorporated into a meeting document, which can provide a summary for the meeting. In some instances, entries into a meeting document can be designated as action items, and those action items can optionally have conditions for reminding meeting participants about the action items and/or for determining whether an action item has been fulfilled. In this way, various tasks that may typically be manually performed by meeting participants, such as creating a meeting summary, can be automated in a more accurate manner. This can preserve resources that may otherwise be wasted during video conferences, in-person meetings, and/or other gatherings.
-
公开(公告)号:US20180330735A1
公开(公告)日:2018-11-15
申请号:US16041434
申请日:2018-07-20
Applicant: Google LLC
Inventor: Brian Strope , Francoise Beaufays , Olivier Siohan
Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
-
-
-
-
-
-
-