-
1.
公开(公告)号:US20220223144A1
公开(公告)日:2022-07-14
申请号:US17611121
申请日:2020-05-13
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai SUN , Zhiwei SHUANG , Lie LU , Shaofan YANG , Jia DAI
Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.
-
公开(公告)号:US20240363131A1
公开(公告)日:2024-10-31
申请号:US18577597
申请日:2022-07-12
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jia DAI , Kai LI , Xiaoyu LIU , Richard J. CARTWRIGHT , Shaofan YANG
IPC: G10L21/0208 , G10L25/27
CPC classification number: G10L21/0208 , G10L25/27 , G10L2021/02082
Abstract: A method for dereverberating audio signals is provided. In some implementations, the method involves obtaining a real acoustic impulse response (AIR); identifying a first portion of the real AIR corresponding to early reflections of a direct sound and a second portion of the real AIR that corresponding to late reflections of the direct sound; generating one or more synthesized AIRs by modifying the first portion of the real AIR and/or the second portion of the real AIR; and using the real AIR and the one or more synthesized AIRs to generate a plurality of training samples, each training sample comprising an input audio signal and a reverberated audio signal, wherein the reverberated audio signal is generated based on the input audio signal and one of the real AIR or one of the one or more synthesized AIRs, which plurality of training samples are used to train a machine learning model.
-
公开(公告)号:US20240177726A1
公开(公告)日:2024-05-30
申请号:US18577586
申请日:2022-07-12
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jia DAI , Kai LI , Xiaoyu LIU , Richard J. CARTWRIGHT
IPC: G10L21/0208 , G06N3/08 , G10L21/0232
CPC classification number: G10L21/0208 , G06N3/08 , G10L21/0232 , G10L2021/02082
Abstract: A method for enhancing audio signals is provided. In some implementations, the method involves (a) obtaining a training set comprising a plurality of training samples, each training sample comprising a distorted audio signal and a clean audio signal. In some implementations, the method involves (b), for a training sample of the plurality of training samples: obtaining a frequency-domain representation of the distorted audio signal; providing the frequency-domain representation to a convolutional neural network (CNN) comprising a plurality of convolutional layers and to a recurrent element, wherein an output of the recurrent element is provided to a subset of the plurality of convolutional layers; generating a predicted enhancement mask, wherein the CNN generates the predicted enhancement mask; generating a predicted enhanced audio signal based on the predicted enhancement mask; and updating weights associated with the CNN and the recurrent element based on the predicted enhanced audio signal.
-
-