-
公开(公告)号:US11259117B1
公开(公告)日:2022-02-22
申请号:US17036807
申请日:2020-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Kanthasamy Chelliah , Wai Chung Chu , Andreas Schwarz , Carlos Renato Nakagawa , Berkant Tacer , Carlo Murgia
Abstract: A system configured to improve audio processing by performing dereverberation and noise reduction during a communication session. The system may apply a two-channel dereverberation algorithm by calculating coherence-to-diffuse ratio (CDR) values and calculating dereverberation (DER) gain values based on the CDR values. While the device calculates the DER gain values prior to performing acoustic echo cancellation (AEC) processing, the device applies the DER gain values after performing residual echo suppression (RES) processing in order to avoid excessive attenuation of the local speech. To improve output speech quality, the device does not apply the DER gain values for nonreverberant signals, when a signal-to-noise ratio (SNR) value is too low, and/or when far-end talk (e.g., remote speech) is present. Dereverberation processing is further improved by using frequency dependent parameters to calculate the DER gain values and by adjusting other gain values when the DER gain values are applied.
-
公开(公告)号:US10600432B1
公开(公告)日:2020-03-24
申请号:US15471629
申请日:2017-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu , Carlo Murgia , Hyeong Cheol Kim
IPC: G10L21/034 , G10L25/84 , G10L21/02 , G10L25/21
Abstract: A system configured to perform power normalization for voice enhancement. The system may identify active intervals corresponding to voice activity and may selectively amplify the active intervals in order to generate output audio data at a near uniform loudness. The system may determine a variable gain for each of the active intervals based on a desired output loudness and a flatness value, which indicates how much a signal envelope is to be modified. For example, a low flatness value corresponds to no modification, with peak active interval values corresponding to the desired output loudness and lower active intervals being lower than the desired output loudness. In contrast, a high flatness value corresponds to extensive modification, with peak active interval values and lower active interval values both corresponding to the desired output loudness. Thus, individual words may share the same peak power level.
-
公开(公告)号:US09818425B1
公开(公告)日:2017-11-14
申请号:US15185799
申请日:2016-06-17
Applicant: Amazon Technologies, Inc.
Inventor: Robert Ayrapetian , Philip Ryan Hilmes , Wai Chung Chu , Hyeong Cheol Kim , Yuwen Su
IPC: G10L21/0224 , G10L15/30 , G10L25/84 , G10L21/0208 , G10L21/0216 , G10L15/22
CPC classification number: G10L21/0224 , G10L15/30 , G10L2015/223 , G10L2021/02082 , G10L2021/02166
Abstract: An echo cancellation system that generates multiple output paths, enabling Automatic Speech Recognition (ASR) processing in parallel with voice communication. For single direction AEC (e.g., ASR processing), the system prioritizes speech from a single user and ignores other speech by selecting a single directional output from a plurality of directional outputs as a first output path. For multi-directional AEC (e.g., voice communication), the system includes all speech by combining the plurality of directional outputs as a second output path. The system may use a weighted sum technique, such that each directional output is represented in the combined output based on a corresponding signal metric, or an equal weighting technique, such that a first group of directional outputs having a higher signal metric may be equally weighted using a first weight while a second group of directional outputs having a lower signal metric may be equally weighted using a second weight.
-
公开(公告)号:US09390723B1
公开(公告)日:2016-07-12
申请号:US14568033
申请日:2014-12-11
Applicant: Amazon Technologies, Inc.
Inventor: John Walter McDonough, Jr. , Wai Chung Chu , Amit Singh Chhetri , Robert Ayrapetian
IPC: H04R3/00 , G10L21/02 , G10K11/175
CPC classification number: G10K11/175 , G10L21/0208 , G10L21/0232 , G10L2021/02082
Abstract: Features are disclosed for performing efficient dereverberation of speech signals captured with single- and multi-channel sensors in networked audio systems. Such features could be used in applications requiring automatic recognition of speech captured with sensors. Dereverberation is performed in the sub-band domain, and hence provides improved dereverberation performance in terms of signal quality, algorithmic delay, computational efficiency, and speed of convergence.
Abstract translation: 公开了用于对网络音频系统中的单通道和多通道传感器捕获的语音信号进行有效的去混响的特征。 这些特征可以用于需要用传感器捕获的语音自动识别的应用中。 在子带域中执行混频,从而在信号质量,算法延迟,计算效率和收敛速度方面提供改进的去混响性能。
-
公开(公告)号:US11915698B1
公开(公告)日:2024-02-27
申请号:US17489223
申请日:2021-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Borham Lee , Wai Chung Chu
Abstract: A system configured to improve track selection while performing audio type detection using sound source localization (SSL) data is provided. A device processes audio data representing sounds from multiple sound sources to determine SSL data that distinguishes between each of the sound sources. The system detects an acoustic event and performs SSL track selection to select the sound source that corresponds to the acoustic event based on input features. To improve SSL track selection, the system detects current conditions of the environment and determines adaptive weight values that vary based on the current conditions, such as a noise level of the environment, whether playback is detected, whether the device is located near one or more walls, etc. By adjusting the adaptive weight values, the system improves an accuracy of the SSL track selection by prioritizing the input features that are most predictive during the current conditions.
-
公开(公告)号:US11749294B2
公开(公告)日:2023-09-05
申请号:US16999233
申请日:2020-08-21
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
IPC: G10L21/028 , H04R1/40 , G10L25/78 , G10L21/0216
CPC classification number: G10L21/028 , G10L25/78 , H04R1/406 , G10L2021/02166 , H04R2430/20
Abstract: A system configured to perform directional speech separation. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a time delay detected by two or more microphones. The system may determine a cross-correlation between each direction and the target direction and select directions with strong correlation. The system may generate time-frequency mask data indicating frequency bands corresponding to the directions associated with a particular audio source. Using the mask data, the system generates output audio data specific to the audio source, resulting in directional speech separation between different audio sources.
-
公开(公告)号:US11317201B1
公开(公告)日:2022-04-26
申请号:US15418973
申请日:2017-01-30
Applicant: Amazon Technologies, Inc.
Inventor: Samuel Henry Chang , Wai Chung Chu
Abstract: A system efficiently selects at least one device from multiple devices based on received audio signals. In some instances, the system receives audio signals from devices that each comprise at least one microphone. A respective audio signal of the audio signals includes a representation of a sound originating from a location. The system then determines a device to be used to respond to the sound. In some instances, the system analyzes times in which the received audio signals that represent the sound are generated and/or volumes of the sound as represented by the received audio signals. The system can then select the device based on the analysis.
-
公开(公告)号:US20200381002A1
公开(公告)日:2020-12-03
申请号:US16999233
申请日:2020-08-21
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
IPC: G10L21/028 , H04R1/40 , G10L25/78
Abstract: A system configured to perform directional speech separation. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a time delay detected by two or more microphones. The system may determine a cross-correlation between each direction and the target direction and select directions with strong correlation. The system may generate time-frequency mask data indicating frequency bands corresponding to the directions associated with a particular audio source. Using the mask data, the system generates output audio data specific to the audio source, resulting in directional speech separation between different audio sources.
-
公开(公告)号:US10115411B1
公开(公告)日:2018-10-30
申请号:US15823050
申请日:2017-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu , Carlo Murgia , Hyeong Cheol Kim
IPC: H04B1/38 , G10L21/0224 , G10L21/0232 , G10L21/02 , G10L21/0208 , G10L21/0216
Abstract: A system configured to improve speech quality by performing residual echo suppression (RES). The system may detect when double-talk conditions are present in individual frequency bands during a voice conversation and may determine gain values for the individual frequency bands. The system may determine whether double-talk conditions are present based on a normalized cross power spectral density function in a frequency domain. If double-talk conditions are present in a frequency band or far end energy is low, the system may determine a gain value that passes audio data in the frequency band, whereas if double-talk conditions are not present, the system may determine a gain value that attenuates audio data in the frequency band. The system may determine binary gain values using a decision threshold value or continuous gain values using a mapping function. The system may control an amount of suppression by selecting different mapping functions and/or parameters.
-
-
-
-
-
-
-
-