-
公开(公告)号:US11967323B2
公开(公告)日:2024-04-23
申请号:US17849253
申请日:2022-06-24
Applicant: GOOGLE LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. U. Bacchiani
CPC classification number: G10L15/22 , G10L15/063 , G10L15/08 , G10L15/30 , G10L17/00 , G10L17/22 , G10L25/51 , G10L2015/088
Abstract: A method includes adding, by a first computing device, a first audio watermark to first speech data corresponding to playback of a first utterance including a hotword used to invoke an attention of a second computing device. The method includes outputting, by the first computing device, the playback of the first utterance corresponding to the watermarked first speech data. The second computing device is configured to receive the watermarked first speech data and determine to cease processing of the watermarked first speech data.
-
公开(公告)号:US10692496B2
公开(公告)日:2020-06-23
申请号:US16418415
申请日:2019-05-21
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.
-
公开(公告)号:US11238848B2
公开(公告)日:2022-02-01
申请号:US16709132
申请日:2019-12-10
Applicant: Google LLC
Inventor: Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado
IPC: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06K9/00 , G10L15/07 , G10L15/26
Abstract: In some implementations, authentication tokens corresponding to known users of a device are stored on the device. An utterance from a speaker is received. The speaker of the utterance is classified as not a known user of the device. A query that includes the authentication tokens that correspond to known users of the device, a representation of the utterance, and an indication that the speaker was classified as not a known user of the device is provided to the server. A response to the query is received at the device and from the server based on the query.
-
公开(公告)号:US20180308491A1
公开(公告)日:2018-10-25
申请号:US15956350
申请日:2018-04-18
Applicant: Google LLC
Inventor: Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado
Abstract: In some implementations, authentication tokens corresponding to known users of a device are stored on the device. An utterance from a speaker is received. The utterance is classified as spoken by a particular known user of the known users. A query that includes a representation of the utterance and an indication of the particular known user as the speaker is provided using the authentication token of the particular known user.
-
公开(公告)号:US20230335116A1
公开(公告)日:2023-10-19
申请号:US18210963
申请日:2023-06-16
Applicant: GOOGLE LLC
Inventor: Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado
CPC classification number: G10L15/08 , G06F16/636 , G06F21/32 , G06V40/10 , G10L15/07 , G10L15/22 , G10L17/00 , G10L17/06 , G10L2015/088 , G10L15/26
Abstract: In some implementations, processor(s) can receive an utterance from a speaker, and determine whether the speaker is a known user of a user device or not a known user of the user device. The user device can be shared by a plurality of known users. Further, the processor(s) can determine whether the utterance corresponds to a personal request or non-personal request. Moreover, and in response to determining that the speaker not a known user of the user device and in response to determining that the utterance corresponds to a non-personal request, the processor(s) can cause a response to the utterance to be provided for presentation to the speaker at the user device response to the utterance, or can cause an action to be performed by the user device responsive to the utterance.
-
公开(公告)号:US11373652B2
公开(公告)日:2022-06-28
申请号:US16874646
申请日:2020-05-14
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. u. Bacchiani
Abstract: A method includes obtaining, by data processing hardware, a plurality of non-watermarked speech samples. Each non-watermarked speech does not include an audio watermark sample. The method includes, from each non-watermarked speech sample of the plurality of non-watermarked speech samples, generating one or more corresponding watermarked speech samples that each include at least one audio watermark. The method includes training, using the plurality of non-watermarked speech samples and corresponding watermarked speech samples, a model to determine whether a given audio data sample includes an audio watermark, and after training the model, transmitting the trained model to a user computing device.
-
公开(公告)号:US11721326B2
公开(公告)日:2023-08-08
申请号:US17584866
申请日:2022-01-26
Applicant: GOOGLE LLC
Inventor: Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado
IPC: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06V40/10 , G10L15/07 , G10L15/26
CPC classification number: G10L15/08 , G06F16/636 , G06F21/32 , G06V40/10 , G10L15/07 , G10L15/22 , G10L17/00 , G10L17/06 , G10L15/26 , G10L2015/088
Abstract: In some implementations, processor(s) can receive an utterance from a speaker, and determine whether the speaker is a known user of a user device or not a known user of the user device. The user device can be shared by a plurality of known users. Further, the processor(s) can determine whether the utterance corresponds to a personal request or non-personal request. Moreover, and in response to determining that the speaker is not a known user of the user device and in response to determining that the utterance corresponds to a non-personal request, the processor(s) can cause a response to the utterance to be provided for presentation to the speaker at the user device response to the utterance, or can cause an action to be performed by the user device responsive to the utterance.
-
公开(公告)号:US20190362719A1
公开(公告)日:2019-11-28
申请号:US16418415
申请日:2019-05-21
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A.U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.
-
公开(公告)号:US20220148577A1
公开(公告)日:2022-05-12
申请号:US17584866
申请日:2022-01-26
Applicant: GOOGLE LLC
Inventor: Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado
Abstract: In some implementations, authentication tokens corresponding to known users of a device are stored on the device. An utterance from a speaker is received. The speaker of the utterance is classified as not a known user of the device. A query that includes the authentication tokens that correspond to known users of the device, a representation of the utterance and an indication that the speaker was classified as not a known user of the device is provided to the server. A response to the query is received at the device and from the server based on the query.
-
公开(公告)号:US20200279562A1
公开(公告)日:2020-09-03
申请号:US16874646
申请日:2020-05-14
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A.u. Bacchiani
Abstract: A method includes obtaining, by data processing hardware, a plurality of non-watermarked speech samples. Each non-watermarked speech does not include an audio watermark sample. The method includes, from each non-watermarked speech sample of the plurality of non-watermarked speech samples, generating one or more corresponding watermarked speech samples that each include at least one audio watermark. The method includes training, using the plurality of non-watermarked speech samples and corresponding watermarked speech samples, a model to determine whether a given audio data sample includes an audio watermark, and after training the model, transmitting the trained model to a user computing device.
-
-
-
-
-
-
-
-
-