-
公开(公告)号:US11600282B2
公开(公告)日:2023-03-07
申请号:US17856856
申请日:2022-07-01
Applicant: Google LLC
Inventor: Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek
IPC: G10L19/038 , G10L25/30 , G10L19/00 , G06N3/08 , G06N3/04
Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.
-
公开(公告)号:US20220383112A1
公开(公告)日:2022-12-01
申请号:US17764005
申请日:2020-09-23
Applicant: Google LLC
Inventor: Marco Tagliasacchi , Félix de Chaumont Quitry , Dominik Roblek
Abstract: A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.
-
公开(公告)号:US20220277773A1
公开(公告)日:2022-09-01
申请号:US17745252
申请日:2022-05-16
Applicant: Google LLC
Inventor: Yossi Matias , Matthew Sharifi , Thomas Bugnon , Dominik Roblek , Annie Chen
IPC: G11B27/031 , G11B27/034 , G11B27/10 , G11B27/28 , G06V20/40 , G06F16/44 , G11B27/30 , H04N5/232
Abstract: Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.
-
公开(公告)号:US20210295850A1
公开(公告)日:2021-09-23
申请号:US17303928
申请日:2021-06-10
Applicant: Google LLC
Inventor: Dominik Roblek , Matthew Sharifi
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.
-
公开(公告)号:US20210183367A1
公开(公告)日:2021-06-17
申请号:US16717518
申请日:2019-12-17
Applicant: Google LLC
Inventor: Matthew Sharifi , Kevin Kilgour , Dominik Roblek , James Lin
Abstract: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.
-
公开(公告)号:US20210056980A1
公开(公告)日:2021-02-25
申请号:US16548146
申请日:2019-08-22
Applicant: Google LLC
Inventor: Beat Gfeller , Dominik Roblek , Félix de Chaumont Quitry , Marco Tagliasacchi
IPC: G10L19/035 , G10L25/18 , G10L19/038 , G06N20/00
Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
-
公开(公告)号:US10467493B2
公开(公告)日:2019-11-05
申请号:US15650790
申请日:2017-07-14
Applicant: Google LLC
Inventor: Dominik Roblek , Christian Szegedy , Jacek Slawosz Jurewicz
Abstract: Systems, methods, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a system includes initial neural network layers configured to: receive an input image, and process the input image to generate a plurality of first feature maps that characterize the input image; a location generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate data defining a respective location of each of a predetermined number of bounding boxes in the input image, wherein each bounding box identifies a respective first region of the input image; and a confidence score generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate a confidence score for each of the predetermined number of bounding boxes in the input image.
-
公开(公告)号:US10460747B2
公开(公告)日:2019-10-29
申请号:US15151362
申请日:2016-05-10
Applicant: Google LLC
Inventor: Dominik Roblek , Matthew Sharifi
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output.
-
公开(公告)号:US10424321B1
公开(公告)日:2019-09-24
申请号:US13932158
申请日:2013-07-01
Applicant: Google LLC
Inventor: Matthew Sharifi , Dominik Roblek
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing an audio sample to determine whether the audio sample includes music audio data. One or more detectors, including a spectral fluctuation detector, a peak repetition detector, and a beat pitch detector, may analyze the audio sample and generate a score that represents whether the audio sample includes music audio data. One or more of the scores may be combined to determine whether the audio sample includes music audio data or non-music audio data.
-
公开(公告)号:US10147197B2
公开(公告)日:2018-12-04
申请号:US15839797
申请日:2017-12-12
Applicant: Google LLC
Inventor: Dominik Roblek , David Petrou , Matthew Sharifi
IPC: G06T7/30 , G06F17/30 , G06T7/90 , G06F3/0484 , G06F3/0488
Abstract: Methods and apparatus directed to segmenting content displayed on a computing device into regions. The segmenting of content displayed on the computing device into regions is accomplished via analysis of pixels of a “screenshot image” that captures at least a portion of (e.g., all of) the displayed content. Individual pixels of the screenshot image may be analyzed to determine one or more regions of the screenshot image and to optionally assign a corresponding semantic type to each of the regions. Some implementations are further directed to generating, based on one or more of the regions, interactive content to provide for presentation to the user via the computing device.
-
-
-
-
-
-
-
-
-