-
公开(公告)号:US11217235B1
公开(公告)日:2022-01-04
申请号:US16686808
申请日:2019-11-18
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu , Anshuman Ganguly , Carlo Murgia
IPC: G10L15/20 , G10L25/21 , H04R1/40 , H04R3/00 , G10L21/0232 , G10L15/22 , G05D1/00 , G10L25/84 , G10L21/0208
Abstract: A device capable of autonomous motion may move in response to a user speaking an utterance, such as a command. Before moving, the device processes audio data received from a microphone array to identify different audio signals arriving at the device from different directions. Based on properties of the audio signals, the device determines which of the audio signals are merely reflections of other audio.
-
公开(公告)号:US11107492B1
公开(公告)日:2021-08-31
申请号:US16574852
申请日:2019-09-18
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
IPC: G10L21/028 , H04R1/40 , H04R3/00 , G10L25/21
Abstract: A system configured to perform directional speech separation using three or more microphones. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. Using three or more microphones, the system may separate audio sources covering 360 degrees surrounding the microphone array, whereas a two-microphone implementation is limited to 180 degrees. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a phase difference detected by two or more microphones.
-
公开(公告)号:US09753119B1
公开(公告)日:2017-09-05
申请号:US14167813
申请日:2014-01-29
Applicant: Amazon Technologies, Inc.
Inventor: Kavitha Velusamy , Ning Yao , Wai Chung Chu , Sowmya Gopalan , Qiang Liu , Rahul Agrawal , Manika Puri
IPC: G01S5/20
CPC classification number: G01S5/20 , G01S5/18 , G01S5/22 , G01S17/023 , G01S17/36 , G01S17/42 , G01S17/46 , G01S17/89
Abstract: A system may utilize sound localization techniques, such as time-difference-of-arrival techniques, to estimate an audio-based sound source position from which a sound originates. An optical image or depth map of an area containing the sound source location may then captured and analyzed to detect an object that is known or expected to have produced the sound. The position of the object may also be determined based on the analysis of the optical image or depth map. The position of the sound source may then be determined based at least in part on the position of the detected object or on a combination of the audio-based sound source position and the determined position of the object.
-
公开(公告)号:US12143783B1
公开(公告)日:2024-11-12
申请号:US17981705
申请日:2022-11-07
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
Abstract: A system configured to perform sound source localization (SSL) using reflection detection is provided. A device processes audio data from multiple microphones to determine timing information corresponding to sound sources. For example, the device may determine cross-correlation data for each microphone pair, determine autocorrelation data for each microphone, and then use the autocorrelation data and the cross-correlation data to calculate quality factors. The device may determine the direction of potential sound source(s) by generating Steered Response Power (SRP) data using the cross-correlation data. To perform reflection detection to distinguish between direct sounds and acoustic reflections, the device may generate modified SRP data using the quality factors. For example, the device may process the SRP data to detect two potential sound sources and then process the modified SRP data to determine that a first potential sound source corresponds to direct sound and a second potential sound source corresponds to acoustic reflections.
-
公开(公告)号:US11386911B1
公开(公告)日:2022-07-12
申请号:US16915037
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Kanthasamy Chelliah , Wai Chung Chu , Andreas Schwarz , Berkant Tacer , Carlo Murgia
IPC: G10L21/0232 , H04R3/04 , H04R5/04 , H04R3/00 , G10L21/0208
Abstract: A system configured to improve audio processing by performing dereverberation and noise reduction during a communication session. The system may apply a two-channel dereverberation algorithm by calculating coherence-to-diffuse ratio (CDR) values and calculating dereverberation (DER) gain values based on the CDR values. While the DER gain values may be calculated at a first stage within the pipeline, the device may apply the DER gain values at a second stage within the pipeline. For example, the device may calculate the DER gain values prior to performing residual echo suppression (RES) processing but may apply the DER gain values after performing RES processing, in order to avoid excessive attenuation of the local speech. In addition to removing reverberation, the DER gain values also remove diffuse noise components, reducing an amount of noise reduction required. Thus, the device may soften noise reduction when the DER gain values are applied.
-
公开(公告)号:US09621984B1
公开(公告)日:2017-04-11
申请号:US14883166
申请日:2015-10-14
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
CPC classification number: H04R1/406 , G01S3/80 , G06F3/165 , G10L15/22 , G10L25/21 , G10L25/48 , G10L2015/223 , G10L2015/227 , G10L2021/02166 , H04R1/403 , H04R3/005 , H04R2201/403 , H04R2203/12 , H04R2430/23
Abstract: Devices, systems, and methods provide direction finding of an acoustic signal source with respect to a voice-controlled device. The direction can be found without using elevation data, instead determining the horizontal location based on power values of the received signal. A large number of candidate vectors having values for azimuth, elevation, and power may be generated by a steered response power algorithm. The large number of vectors is reduced to a small number of reference azimuths spanning an azimuth range by associating the vectors with the closest reference azimuth and then calculating an average and/or maximum power of the associated vectors at each reference azimuth. The reference azimuth with the highest average (or maximum) power may be set as the direction of the signal source. Alternatively, each reference azimuth having an average (or maximum) power exceeding a threshold may be considered a direction of one of multiple sources.
-
公开(公告)号:US11950062B1
公开(公告)日:2024-04-02
申请号:US17709563
申请日:2022-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu , Carlo Murgia
Abstract: A system configured to improve sound source localization (SSL) processing by reducing a number of direction vectors and grouping the direction vectors into direction cells is provided. The system performs clustering to generate a smaller set of direction vectors included in a delay-direction codebook, reducing a size of the codebook to the number of unique delay vectors. In addition, the system groups the direction vectors into direction cells having a regular structure (e.g., predetermined uniformity and/or symmetry), which simplifies SSL processing and results in a substantial reduction in computational cost. The system may also select between multiple codebooks and/or dynamically adjust the codebook to compensate for changes to the microphone array. For example, a device with a microphone array fixed to a display that can tilt may adjust the codebook based on a tilt angle of the display to improve accuracy.
-
公开(公告)号:US11545172B1
公开(公告)日:2023-01-03
申请号:US17195904
申请日:2021-03-09
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
Abstract: A system configured to perform sound source localization (SSL) using reflection classification is provided. A device processes audio data representing sounds from multiple sound sources to generate sound track data that includes an individual sound track for each of the sound sources. To detect reflections, the device determines whether a pair of sound tracks are strongly correlated. For example, the device may calculate a correlation value for each pairwise combination of the sound tracks and determine whether the correlation value exceeds a threshold value. When the correlation value exceeds the threshold, the device invokes a reflection classifier trained to distinguish between direct sound sources and reflected sound sources. For example, the device extracts feature data from the pair of sound tracks and processes the feature data using a trained model to determine which of the sound tracks corresponds to the direct sound source.
-
公开(公告)号:US10755727B1
公开(公告)日:2020-08-25
申请号:US16141375
申请日:2018-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Wai Chung Chu
IPC: G10L21/028 , H04R1/40 , G10L25/78 , G10L21/0216
Abstract: A system configured to perform directional speech separation. The system may dynamically associate direction-of-arrivals with one or more audio sources in order to generate output audio data that separates each of the audio sources. The system identifies a target direction for each audio source, dynamically determines directions that are correlated with the target direction, and generates output signals for each audio source. The system may associate individual frequency bands with specific directions based on a time delay detected by two or more microphones. The system may determine a cross-correlation between each direction and the target direction and select directions with strong correlation. The system may generate time-frequency mask data indicating frequency bands corresponding to the directions associated with a particular audio source. Using the mask data, the system generates output audio data specific to the audio source, resulting in directional speech separation between different audio sources.
-
公开(公告)号:US12047756B1
公开(公告)日:2024-07-23
申请号:US17722680
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Samuel Henry Chang , Wai Chung Chu
CPC classification number: H04R3/005 , H04R1/406 , H04R2201/403 , H04R2430/20
Abstract: A system efficiently selects at least one device from multiple devices based on received audio signals. In some instances, the system receives audio signals from devices that each comprise at least one microphone. A respective audio signal of the audio signals includes a representation of a sound originating from a location. The system then determines a device to be used to respond to the sound. In some instances, the system analyzes times in which the received audio signals that represent the sound are generated and/or volumes of the sound as represented by the received audio signals. The system can then select the device based on the analysis.
-
-
-
-
-
-
-
-
-