METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK

发明申请

US20220223144A1 METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK 有权

请登陆查看更多内容

专利标题： METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK
申请号： US17611121

申请日： 2020-05-13
公开(公告)号： US20220223144A1

公开(公告)日： 2022-07-14
发明人: Jundai SUN , Zhiwei SHUANG , Lie LU , Shaofan YANG , Jia DAI
申请人： Dolby Laboratories Licensing Corporation
申请人地址： US CA San Francisco
专利权人： Dolby Laboratories Licensing Corporation
当前专利权人： Dolby Laboratories Licensing Corporation
当前专利权人地址： US CA San Francisco
优先权： CNPCT/CN2019/086769 20190514,EP19188010.3 20190724
国际申请： PCT/US2020/032762 WO 20200513
主分类号： G10L15/20
IPC分类号： G10L15/20 ; G10L15/16 ; G10L15/22 ; G10L21/0308 ; G10L25/18 ; G06N3/08

METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK

摘要：

Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

公开/授权文献

US12073828B2 Method and apparatus for speech source separation based on a convolutional neural network 公开/授权日：2024-08-27

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/20	.专门适用于不利环境（例如，噪音环境）中保持鲁棒性或增强语音强度的语音识别技术（G10L21/02优先）