METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK

    公开(公告)号:US20220223144A1

    公开(公告)日:2022-07-14

    申请号:US17611121

    申请日:2020-05-13

    Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

    SPEECH ENHANCEMENT
    2.
    发明公开
    SPEECH ENHANCEMENT 审中-公开

    公开(公告)号:US20240363131A1

    公开(公告)日:2024-10-31

    申请号:US18577597

    申请日:2022-07-12

    CPC classification number: G10L21/0208 G10L25/27 G10L2021/02082

    Abstract: A method for dereverberating audio signals is provided. In some implementations, the method involves obtaining a real acoustic impulse response (AIR); identifying a first portion of the real AIR corresponding to early reflections of a direct sound and a second portion of the real AIR that corresponding to late reflections of the direct sound; generating one or more synthesized AIRs by modifying the first portion of the real AIR and/or the second portion of the real AIR; and using the real AIR and the one or more synthesized AIRs to generate a plurality of training samples, each training sample comprising an input audio signal and a reverberated audio signal, wherein the reverberated audio signal is generated based on the input audio signal and one of the real AIR or one of the one or more synthesized AIRs, which plurality of training samples are used to train a machine learning model.

    SPEECH ENHANCEMENT
    3.
    发明公开
    SPEECH ENHANCEMENT 审中-公开

    公开(公告)号:US20240177726A1

    公开(公告)日:2024-05-30

    申请号:US18577586

    申请日:2022-07-12

    CPC classification number: G10L21/0208 G06N3/08 G10L21/0232 G10L2021/02082

    Abstract: A method for enhancing audio signals is provided. In some implementations, the method involves (a) obtaining a training set comprising a plurality of training samples, each training sample comprising a distorted audio signal and a clean audio signal. In some implementations, the method involves (b), for a training sample of the plurality of training samples: obtaining a frequency-domain representation of the distorted audio signal; providing the frequency-domain representation to a convolutional neural network (CNN) comprising a plurality of convolutional layers and to a recurrent element, wherein an output of the recurrent element is provided to a subset of the plurality of convolutional layers; generating a predicted enhancement mask, wherein the CNN generates the predicted enhancement mask; generating a predicted enhanced audio signal based on the predicted enhancement mask; and updating weights associated with the CNN and the recurrent element based on the predicted enhanced audio signal.

Patent Agency Ranking