Conditioned Separation of Arbitrary Sounds based on Machine Learning Models

Invention Publication

US20230419989A1 Conditioned Separation of Arbitrary Sounds based on Machine Learning Models 审中-公开

Please log in to see more content

Patent Title: Conditioned Separation of Arbitrary Sounds based on Machine Learning Models
Application No.: US17808653

Application Date: 2022-06-24
Publication No.: US20230419989A1

Publication Date: 2023-12-28
Inventor: Beat Gfeller , Kevin Ian Kilgour , Marco Tagliasacchi , Aren Jansen , Scott Thomas Wisdom , Qingqing Huang
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G10L25/84
IPC: G10L25/84 ; G10L15/16 ; G10L15/06 ; G06N3/04

Conditioned Separation of Arbitrary Sounds based on Machine Learning Models

Abstract:

Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L25/00	不限于组G10L 15/00-G10L 21/00的语言或者声音分析技术(当利用语音检测器来感知一些信号特殊特征的基于半导体的静噪放大器，如无信号时的感知入H03G3/34)
G10L25/78	.语音信号存在或不存在的检测（在双向扩音电话系统中通过语音频率切换传输的方向入H04M9/10）
G10L25/84	..从噪声判别声音