Voice filtering other speakers from calls and audio messages

    公开(公告)号:US12087297B2

    公开(公告)日:2024-09-10

    申请号:US17930822

    申请日:2022-09-09

    申请人: Google LLC

    摘要: A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.

    SOUND SOURCE SEPARATION USING ANGULAR LOCATION

    公开(公告)号:US20240274148A1

    公开(公告)日:2024-08-15

    申请号:US18645793

    申请日:2024-04-25

    申请人: Intel Corporation

    IPC分类号: G10L21/0272 G10L25/30

    CPC分类号: G10L21/0272 G10L25/30

    摘要: Systems and methods for audio source separation. A deep learning-based system uses an azimuth angle location to separate an audio signal originating from a selected location from other sound. Techniques are disclosed for steering a virtual direction of a microphone towards a selected speaker. A deep-learning based audio regression method, which can be implemented as a neural network, learns to separate out various speakers by leveraging spectral and spatial characteristics of all sources. The neural network can focus on multiple sources in multiple respective target directions, and cancel out other sounds. A user can choose which source to listen to. The network can use the time-domain signal and a frequency-domain signal to separate out the target signal and generate a separated audio output. The direction of the selected speaker relative to the microphone array can be input to the system as a vector.

    Method, apparatus, device, and storage medium for speaker change point detection

    公开(公告)号:US12039981B2

    公开(公告)日:2024-07-16

    申请号:US18394143

    申请日:2023-12-22

    IPC分类号: G10L21/0272 G10L17/04

    CPC分类号: G10L17/04

    摘要: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.