Dereverberation and noise reduction

    公开(公告)号:US11259117B1

    公开(公告)日:2022-02-22

    申请号:US17036807

    申请日:2020-09-29

    Abstract: A system configured to improve audio processing by performing dereverberation and noise reduction during a communication session. The system may apply a two-channel dereverberation algorithm by calculating coherence-to-diffuse ratio (CDR) values and calculating dereverberation (DER) gain values based on the CDR values. While the device calculates the DER gain values prior to performing acoustic echo cancellation (AEC) processing, the device applies the DER gain values after performing residual echo suppression (RES) processing in order to avoid excessive attenuation of the local speech. To improve output speech quality, the device does not apply the DER gain values for nonreverberant signals, when a signal-to-noise ratio (SNR) value is too low, and/or when far-end talk (e.g., remote speech) is present. Dereverberation processing is further improved by using frequency dependent parameters to calculate the DER gain values and by adjusting other gain values when the DER gain values are applied.

    Methods for voice enhancement
    12.
    发明授权

    公开(公告)号:US10600432B1

    公开(公告)日:2020-03-24

    申请号:US15471629

    申请日:2017-03-28

    Abstract: A system configured to perform power normalization for voice enhancement. The system may identify active intervals corresponding to voice activity and may selectively amplify the active intervals in order to generate output audio data at a near uniform loudness. The system may determine a variable gain for each of the active intervals based on a desired output loudness and a flatness value, which indicates how much a signal envelope is to be modified. For example, a low flatness value corresponds to no modification, with peak active interval values corresponding to the desired output loudness and lower active intervals being lower than the desired output loudness. In contrast, a high flatness value corresponds to extensive modification, with peak active interval values and lower active interval values both corresponding to the desired output loudness. Thus, individual words may share the same peak power level.

    Efficient dereverberation in networked audio systems
    14.
    发明授权
    Efficient dereverberation in networked audio systems 有权
    网络音频系统中的高效混响

    公开(公告)号:US09390723B1

    公开(公告)日:2016-07-12

    申请号:US14568033

    申请日:2014-12-11

    CPC classification number: G10K11/175 G10L21/0208 G10L21/0232 G10L2021/02082

    Abstract: Features are disclosed for performing efficient dereverberation of speech signals captured with single- and multi-channel sensors in networked audio systems. Such features could be used in applications requiring automatic recognition of speech captured with sensors. Dereverberation is performed in the sub-band domain, and hence provides improved dereverberation performance in terms of signal quality, algorithmic delay, computational efficiency, and speed of convergence.

    Abstract translation: 公开了用于对网络音频系统中的单通道和多通道传感器捕获的语音信号进行有效的去混响的特征。 这些特征可以用于需要用传感器捕获的语音自动识别的应用中。 在子带域中执行混频,从而在信号质量,算法延迟,计算效率和收敛速度方面提供改进的去混响性能。

    Sound source localization
    15.
    发明授权

    公开(公告)号:US11915698B1

    公开(公告)日:2024-02-27

    申请号:US17489223

    申请日:2021-09-29

    CPC classification number: G10L15/22 G10L15/10

    Abstract: A system configured to improve track selection while performing audio type detection using sound source localization (SSL) data is provided. A device processes audio data representing sounds from multiple sound sources to determine SSL data that distinguishes between each of the sound sources. The system detects an acoustic event and performs SSL track selection to select the sound source that corresponds to the acoustic event based on input features. To improve SSL track selection, the system detects current conditions of the environment and determines adaptive weight values that vary based on the current conditions, such as a noise level of the environment, whether playback is detected, whether the device is located near one or more walls, etc. By adjusting the adaptive weight values, the system improves an accuracy of the SSL track selection by prioritizing the input features that are most predictive during the current conditions.

    Directional speech separation
    16.
    发明授权

    公开(公告)号:US11749294B2

    公开(公告)日:2023-09-05

    申请号:US16999233

    申请日:2020-08-21

    Inventor: Wai Chung Chu

    Abstract: A system configured to perform directional speech separation. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a time delay detected by two or more microphones. The system may determine a cross-correlation between each direction and the target direction and select directions with strong correlation. The system may generate time-frequency mask data indicating frequency bands corresponding to the directions associated with a particular audio source. Using the mask data, the system generates output audio data specific to the audio source, resulting in directional speech separation between different audio sources.

    Analyzing audio signals for device selection

    公开(公告)号:US11317201B1

    公开(公告)日:2022-04-26

    申请号:US15418973

    申请日:2017-01-30

    Abstract: A system efficiently selects at least one device from multiple devices based on received audio signals. In some instances, the system receives audio signals from devices that each comprise at least one microphone. A respective audio signal of the audio signals includes a representation of a sound originating from a location. The system then determines a device to be used to respond to the sound. In some instances, the system analyzes times in which the received audio signals that represent the sound are generated and/or volumes of the sound as represented by the received audio signals. The system can then select the device based on the analysis.

    DIRECTIONAL SPEECH SEPARATION
    18.
    发明申请

    公开(公告)号:US20200381002A1

    公开(公告)日:2020-12-03

    申请号:US16999233

    申请日:2020-08-21

    Inventor: Wai Chung Chu

    Abstract: A system configured to perform directional speech separation. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a time delay detected by two or more microphones. The system may determine a cross-correlation between each direction and the target direction and select directions with strong correlation. The system may generate time-frequency mask data indicating frequency bands corresponding to the directions associated with a particular audio source. Using the mask data, the system generates output audio data specific to the audio source, resulting in directional speech separation between different audio sources.

    Methods for suppressing residual echo

    公开(公告)号:US10115411B1

    公开(公告)日:2018-10-30

    申请号:US15823050

    申请日:2017-11-27

    Abstract: A system configured to improve speech quality by performing residual echo suppression (RES). The system may detect when double-talk conditions are present in individual frequency bands during a voice conversation and may determine gain values for the individual frequency bands. The system may determine whether double-talk conditions are present based on a normalized cross power spectral density function in a frequency domain. If double-talk conditions are present in a frequency band or far end energy is low, the system may determine a gain value that passes audio data in the frequency band, whereas if double-talk conditions are not present, the system may determine a gain value that attenuates audio data in the frequency band. The system may determine binary gain values using a decision threshold value or continuous gain values using a mapping function. The system may control an amount of suppression by selecting different mapping functions and/or parameters.

Patent Agency Ranking