Omni-directional speech separation

    公开(公告)号:US11107492B1

    公开(公告)日:2021-08-31

    申请号:US16574852

    申请日:2019-09-18

    Inventor: Wai Chung Chu

    Abstract: A system configured to perform directional speech separation using three or more microphones. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. Using three or more microphones, the system may separate audio sources covering 360 degrees surrounding the microphone array, whereas a two-microphone implementation is limited to 180 degrees. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a phase difference detected by two or more microphones.

    Sound source localization with reflection detection

    公开(公告)号:US12143783B1

    公开(公告)日:2024-11-12

    申请号:US17981705

    申请日:2022-11-07

    Inventor: Wai Chung Chu

    Abstract: A system configured to perform sound source localization (SSL) using reflection detection is provided. A device processes audio data from multiple microphones to determine timing information corresponding to sound sources. For example, the device may determine cross-correlation data for each microphone pair, determine autocorrelation data for each microphone, and then use the autocorrelation data and the cross-correlation data to calculate quality factors. The device may determine the direction of potential sound source(s) by generating Steered Response Power (SRP) data using the cross-correlation data. To perform reflection detection to distinguish between direct sounds and acoustic reflections, the device may generate modified SRP data using the quality factors. For example, the device may process the SRP data to detect two potential sound sources and then process the modified SRP data to determine that a first potential sound source corresponds to direct sound and a second potential sound source corresponds to acoustic reflections.

    Dereverberation and noise reduction

    公开(公告)号:US11386911B1

    公开(公告)日:2022-07-12

    申请号:US16915037

    申请日:2020-06-29

    Abstract: A system configured to improve audio processing by performing dereverberation and noise reduction during a communication session. The system may apply a two-channel dereverberation algorithm by calculating coherence-to-diffuse ratio (CDR) values and calculating dereverberation (DER) gain values based on the CDR values. While the DER gain values may be calculated at a first stage within the pipeline, the device may apply the DER gain values at a second stage within the pipeline. For example, the device may calculate the DER gain values prior to performing residual echo suppression (RES) processing but may apply the DER gain values after performing RES processing, in order to avoid excessive attenuation of the local speech. In addition to removing reverberation, the DER gain values also remove diffuse noise components, reducing an amount of noise reduction required. Thus, the device may soften noise reduction when the DER gain values are applied.

    Direction finding of sound sources

    公开(公告)号:US11950062B1

    公开(公告)日:2024-04-02

    申请号:US17709563

    申请日:2022-03-31

    CPC classification number: H04R3/005 G10L25/21 H04R1/406

    Abstract: A system configured to improve sound source localization (SSL) processing by reducing a number of direction vectors and grouping the direction vectors into direction cells is provided. The system performs clustering to generate a smaller set of direction vectors included in a delay-direction codebook, reducing a size of the codebook to the number of unique delay vectors. In addition, the system groups the direction vectors into direction cells having a regular structure (e.g., predetermined uniformity and/or symmetry), which simplifies SSL processing and results in a substantial reduction in computational cost. The system may also select between multiple codebooks and/or dynamically adjust the codebook to compensate for changes to the microphone array. For example, a device with a microphone array fixed to a display that can tilt may adjust the codebook based on a tilt angle of the display to improve accuracy.

    Sound source localization using reflection classification

    公开(公告)号:US11545172B1

    公开(公告)日:2023-01-03

    申请号:US17195904

    申请日:2021-03-09

    Inventor: Wai Chung Chu

    Abstract: A system configured to perform sound source localization (SSL) using reflection classification is provided. A device processes audio data representing sounds from multiple sound sources to generate sound track data that includes an individual sound track for each of the sound sources. To detect reflections, the device determines whether a pair of sound tracks are strongly correlated. For example, the device may calculate a correlation value for each pairwise combination of the sound tracks and determine whether the correlation value exceeds a threshold value. When the correlation value exceeds the threshold, the device invokes a reflection classifier trained to distinguish between direct sound sources and reflected sound sources. For example, the device extracts feature data from the pair of sound tracks and processes the feature data using a trained model to determine which of the sound tracks corresponds to the direct sound source.

    Directional speech separation
    9.
    发明授权

    公开(公告)号:US10755727B1

    公开(公告)日:2020-08-25

    申请号:US16141375

    申请日:2018-09-25

    Inventor: Wai Chung Chu

    Abstract: A system configured to perform directional speech separation. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a time delay detected by two or more microphones. The system may determine a cross-correlation between each direction and the target direction and select directions with strong correlation. The system may generate time-frequency mask data indicating frequency bands corresponding to the directions associated with a particular audio source. Using the mask data, the system generates output audio data specific to the audio source, resulting in directional speech separation between different audio sources.

    Analyzing audio signals for device selection

    公开(公告)号:US12047756B1

    公开(公告)日:2024-07-23

    申请号:US17722680

    申请日:2022-04-18

    CPC classification number: H04R3/005 H04R1/406 H04R2201/403 H04R2430/20

    Abstract: A system efficiently selects at least one device from multiple devices based on received audio signals. In some instances, the system receives audio signals from devices that each comprise at least one microphone. A respective audio signal of the audio signals includes a representation of a sound originating from a location. The system then determines a device to be used to respond to the sound. In some instances, the system analyzes times in which the received audio signals that represent the sound are generated and/or volumes of the sound as represented by the received audio signals. The system can then select the device based on the analysis.

Patent Agency Ranking