Patent search ap:("GOOGLE LLC") AND inv:"Marco Tagliasacchi" Page 1

1.

发明授权
Self-supervised audio representation learning for mobile devices 有权

公开(公告)号：US12165663B2

公开(公告)日：2024-12-10

申请号：US17986477

申请日：2022-11-14

Applicant: Google LLC

Inventor： Beat Gfeller , Dominik Roblek , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L19/035 , G06N20/00 , G10L19/038 , G10L25/18

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

2.

发明公开
GENERATING CODED DATA REPRESENTATIONS USING NEURAL NETWORKS AND VECTOR QUANTIZERS 审中-公开

公开(公告)号：US20240185870A1

公开(公告)日：2024-06-06

申请号：US18400992

申请日：2023-12-29

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G06N3/045 , G06N3/08 , G10L25/30

CPC classification number: G10L19/038 , G06N3/045 , G06N3/08 , G10L25/30 , G10L2019/0002

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

3.

发明授权
Self-supervised audio representation learning for mobile devices 有权

公开(公告)号：US11501787B2

公开(公告)日：2022-11-15

申请号：US16548146

申请日：2019-08-22

Applicant: Google LLC

Inventor： Beat Gfeller , Dominik Roblek , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L19/035 , G06N20/00 , G10L19/038 , G10L25/18

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

4.

发明申请
USING MACHINE LEARNING AND DISCRETE TOKENS TO ESTIMATE DIFFERENT SOUND SOURCES FROM AUDIO MIXTURES 有权

公开(公告)号：US20250054500A1

公开(公告)日：2025-02-13

申请号：US18233323

申请日：2023-08-13

Applicant: Google LLC

Inventor： Hakan Erdogan , Scott Thomas Wisdom , John Hershey , Zalán Borsos , Marco Tagliasacchi , Neil Zeghidour , Xuankai Chang

IPC: G10L17/20 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/18

Abstract: A system and method are disclosed. Audio input comprising the mixed audio signals is received by one or more client devices. The audio input is converted into a plurality of discrete tokens. A plurality of sound sources, each corresponding to a subset of discrete tokens of a plurality of subsets of discrete tokens, is determined using a trained machine learning model.

5.

发明授权
Spatial audio recording from home assistant devices 有权

公开(公告)号：US12200465B2

公开(公告)日：2025-01-14

申请号：US17748356

申请日：2022-05-19

Applicant: Google LLC

Inventor： Rajeev Conrad Nongpiur , Qian Zhang , Andrew James Sutter , Kung-Wei Liu , Jihan Li , Hélène Bahu , Leonardo Kusumo , Sze Chie Lim , Marco Tagliasacchi , Neil Zeghidour , Michael Takezo Chinen

IPC: H04S3/00 , G06N20/00 , G10L19/008 , H04R3/00 , H04R5/027 , H04S7/00

Abstract: The technology generally relates to spatial audio communication between devices. For example, a first device and a second device may be connected via a communication link. The first device may capture audio signals in an environment through two or more microphones. The first device may encode the captured audio with spatial configuration data. The first device may transmit the encoded audio via the communication link to the second device. The second device may decode the encoded audio into binaural or ambisonic audio to be output by one or more speakers of the second device. The binaural or ambisonic audio may be converted into spatial audio to be output. The second device may output the binaural or spatial audio to create an immersive listening experience.

6.

发明申请
AUDIO-FOCUS FOR AMBIENT NOISE CANCELLATION 有权

公开(公告)号：US20240428818A1

公开(公告)日：2024-12-26

申请号：US18751015

申请日：2024-06-21

Applicant: GOOGLE LLC

Inventor： Rajeev Nongpiur , Neil Zeghidour , Marco Tagliasacchi

IPC: G10L21/0364 , G10L15/06 , G10L21/0208 , G10L21/0232 , G10L21/034 , G10L25/30

Abstract: A method including identifying an audio capture device and a target direction associated with the audio capture device, detecting first audio associated with the target direction, enhancing the first audio using a machine learning model configured to detect audio associated with the target direction, optionally, detecting second audio associated with a direction different from the target direction, and optionally, diminishing the second audio using the machine learning model.

7.

发明授权
Self-supervised pitch estimation 有权

公开(公告)号：US11756530B2

公开(公告)日：2023-09-12

申请号：US17640579

申请日：2020-09-25

Applicant: GOOGLE LLC

Inventor： Marco Tagliasacchi , Mihajlo Velimirovic , Matthew Sharifi , Dominik Roblek , Christian Frank , Beat Gfeller

IPC: G10L15/06 , G10L21/013 , G10L25/30 , G10L25/90

CPC classification number: G10L15/063 , G10L21/013 , G10L25/30 , G10L25/90

Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

8.

发明授权
Compressing audio waveforms using neural networks and vector quantizers 有权

公开(公告)号：US11600282B2

公开(公告)日：2023-03-07

申请号：US17856856

申请日：2022-07-01

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G10L25/30 , G10L19/00 , G06N3/08 , G06N3/04

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

9.

发明申请
MULTI-TASK ADAPTER NEURAL NETWORKS 有权

公开(公告)号：US20220383112A1

公开(公告)日：2022-12-01

申请号：US17764005

申请日：2020-09-23

Applicant: Google LLC

Inventor： Marco Tagliasacchi , Félix de Chaumont Quitry , Dominik Roblek

IPC: G06N3/08 , G06N3/04 , G10L25/30

Abstract: A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.

10.

发明申请
Self-Supervised Audio Representation Learning for Mobile Devices 有权

公开(公告)号：US20210056980A1

公开(公告)日：2021-02-25

申请号：US16548146

申请日：2019-08-22

Applicant: Google LLC

Inventor： Beat Gfeller , Dominik Roblek , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L19/035 , G10L25/18 , G10L19/038 , G06N20/00

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification