Patent search ap:("Dolby Laboratories Licensing Corporation") AND inv:"Jia DAI" Page 1

1.

发明申请
METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK 有权

公开(公告)号：US20220223144A1

公开(公告)日：2022-07-14

申请号：US17611121

申请日：2020-05-13

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Jundai SUN , Zhiwei SHUANG , Lie LU , Shaofan YANG , Jia DAI

IPC: G10L15/20 , G10L15/16 , G10L15/22 , G10L21/0308 , G10L25/18 , G06N3/08

Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

2.

发明公开
SPEECH ENHANCEMENT 审中-公开

公开(公告)号：US20240363131A1

公开(公告)日：2024-10-31

申请号：US18577597

申请日：2022-07-12

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Jia DAI , Kai LI , Xiaoyu LIU , Richard J. CARTWRIGHT , Shaofan YANG

IPC: G10L21/0208 , G10L25/27

CPC classification number: G10L21/0208 , G10L25/27 , G10L2021/02082

Abstract: A method for dereverberating audio signals is provided. In some implementations, the method involves obtaining a real acoustic impulse response (AIR); identifying a first portion of the real AIR corresponding to early reflections of a direct sound and a second portion of the real AIR that corresponding to late reflections of the direct sound; generating one or more synthesized AIRs by modifying the first portion of the real AIR and/or the second portion of the real AIR; and using the real AIR and the one or more synthesized AIRs to generate a plurality of training samples, each training sample comprising an input audio signal and a reverberated audio signal, wherein the reverberated audio signal is generated based on the input audio signal and one of the real AIR or one of the one or more synthesized AIRs, which plurality of training samples are used to train a machine learning model.

3.

发明公开
SPEECH ENHANCEMENT 审中-公开

公开(公告)号：US20240177726A1

公开(公告)日：2024-05-30

申请号：US18577586

申请日：2022-07-12

Applicant: Dolby Laboratories Licensing Corporation

Inventor： Jia DAI , Kai LI , Xiaoyu LIU , Richard J. CARTWRIGHT

IPC: G10L21/0208 , G06N3/08 , G10L21/0232

CPC classification number: G10L21/0208 , G06N3/08 , G10L21/0232 , G10L2021/02082

Abstract: A method for enhancing audio signals is provided. In some implementations, the method involves (a) obtaining a training set comprising a plurality of training samples, each training sample comprising a distorted audio signal and a clean audio signal. In some implementations, the method involves (b), for a training sample of the plurality of training samples: obtaining a frequency-domain representation of the distorted audio signal; providing the frequency-domain representation to a convolutional neural network (CNN) comprising a plurality of convolutional layers and to a recurrent element, wherein an output of the recurrent element is provided to a subset of the plurality of convolutional layers; generating a predicted enhancement mask, wherein the CNN generates the predicted enhancement mask; generating a predicted enhanced audio signal based on the predicted enhancement mask; and updating weights associated with the CNN and the recurrent element based on the predicted enhanced audio signal.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification