SYSTEM AND METHOD FOR MASK-BASED NEURAL BEAMFORMING FOR MULTI-CHANNEL SPEECH ENHANCEMENT

    公开(公告)号:US20240331715A1

    公开(公告)日:2024-10-03

    申请号:US18457921

    申请日:2023-08-29

    IPC分类号: G10L21/0224

    摘要: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.