Patent search ap:("Google LLC") AND inv:"Scott Thomas Wisdom" Page 1

1.

发明申请
USING MACHINE LEARNING AND DISCRETE TOKENS TO ESTIMATE DIFFERENT SOUND SOURCES FROM AUDIO MIXTURES 有权

公开(公告)号：US20250054500A1

公开(公告)日：2025-02-13

申请号：US18233323

申请日：2023-08-13

Applicant: Google LLC

Inventor： Hakan Erdogan , Scott Thomas Wisdom , John Hershey , Zalán Borsos , Marco Tagliasacchi , Neil Zeghidour , Xuankai Chang

IPC: G10L17/20 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/18

Abstract: A system and method are disclosed. Audio input comprising the mixed audio signals is received by one or more client devices. The audio input is converted into a plurality of discrete tokens. A plurality of sound sources, each corresponding to a subset of discrete tokens of a plurality of subsets of discrete tokens, is determined using a trained machine learning model.

2.

发明公开
Conditioned Separation of Arbitrary Sounds based on Machine Learning Models 审中-公开

公开(公告)号：US20230419989A1

公开(公告)日：2023-12-28

申请号：US17808653

申请日：2022-06-24

Applicant: Google LLC

Inventor： Beat Gfeller , Kevin Ian Kilgour , Marco Tagliasacchi , Aren Jansen , Scott Thomas Wisdom , Qingqing Huang

IPC: G10L25/84 , G10L15/16 , G10L15/06 , G06N3/04

CPC classification number: G10L25/84 , G10L15/16 , G10L15/063 , G06N3/0454

Abstract: Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.

Patent Agency Ranking