-
公开(公告)号:US10796716B1
公开(公告)日:2020-10-06
申请号:US16157319
申请日:2018-10-11
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US20240071408A1
公开(公告)日:2024-02-29
申请号:US18243804
申请日:2023-09-08
Applicant: Amazon Technologies, Inc.
Inventor: Qingming Tang , Chieh-Chi Kao , Qin Zhang , Ming Sun , Chao Wang , Sumit Garg , Rong Chen , James Garnet Droppo , Chia-Jung Chang
Abstract: A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data corresponding to audio data that potentially represents occurrence of one or more events. Based on processing by the first and second AED components, a device may output data indicating that one or more acoustic events occurred, where the acoustic events may be a predetermined acoustic event and/or a custom acoustic event.
-
公开(公告)号:US20240071407A1
公开(公告)日:2024-02-29
申请号:US18243800
申请日:2023-09-08
Applicant: Amazon Technologies, Inc.
Inventor: Harshavardhan Sundar , Sheetal Laad , Jialiang Bao , Ming Sun , Chao Wang , Chungnam Chan , Cengiz Erbas , Mathias Jourdain , Nipul Bharani , Aaron David Wirshba
CPC classification number: G10L25/51 , G10L15/063 , G10L15/22 , G10L25/78 , G10L2015/0635
Abstract: Techniques for detecting certain acoustic events from audio data are described. A system may perform event aggregation for certain types of events before sending an output to a device representing the event is detected. The system may bypass the event aggregation process for certain types of events that the system may detect with a high level of confidence. In such cases, the system may send an output to the device when the event is detected. The system may be used to detect acoustic events representing presence of a person or other harmful circumstances (such as, fire, smoke, etc.) in a home, an office, a store, or other types of indoor settings.
-
公开(公告)号:US11854538B1
公开(公告)日:2023-12-26
申请号:US16277328
申请日:2019-02-15
Applicant: Amazon Technologies, Inc.
Inventor: Viktor Rozgic , Chao Wang , Ming Sun , Srinivas Parthasarathy
CPC classification number: G10L15/1815 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16
Abstract: Described herein is a system for sentiment detection in audio data. The system processes audio frame level features of input audio data using a machine learning algorithm to classify the input audio data into a particular sentiment category. The machine learning algorithm may be a neural network trained using an encoder-decoder method. The training of the machine learning algorithm may include normalization techniques to avoid potential bias in the training data that may occur when the training data is annotated for a perceived sentiment of the speaker.
-
公开(公告)号:US11790932B2
公开(公告)日:2023-10-17
申请号:US17547644
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Qingming Tang , Chieh-Chi Kao , Qin Zhang , Ming Sun , Chao Wang , Sumit Garg , Rong Chen , James Garnet Droppo , Chia-Jung Chang
CPC classification number: G10L25/51 , G06N3/045 , G06N3/08 , G10L25/21 , G10L25/30 , G10L15/08 , G10L15/22 , G10L2015/088 , G10L2015/223
Abstract: A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data corresponding to audio data that potentially represents occurrence of one or more events. Based on processing by the first and second AED components, a device may output data indicating that one or more acoustic events occurred, where the acoustic events may be a predetermined acoustic event and/or a custom acoustic event.
-
公开(公告)号:US11132990B1
公开(公告)日:2021-09-28
申请号:US16453063
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechal , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
公开(公告)号:US10418957B1
公开(公告)日:2019-09-17
申请号:US16023990
申请日:2018-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Weiran Wang , Chao Wang , Chieh-Chi Kao
Abstract: An audio event detection system that subsamples input audio data using a series of recurrent neural networks to create data of a coarser time scale than the audio data. Data frames corresponding to the coarser time scale may then be upsampled to data frames that match the finer time scale of the original audio data frames. The resulting data frames are then scored with a classifier to determine a likelihood that the individual frames correspond to an audio event. Each frame is then weighted by its score and a composite weighted frame is created by summing the weighted frames and dividing by the cumulative score. The composite weighted frame is then scored by the classifier. The resulting score is taken as an overall score indicating a likelihood that the input audio data includes an audio event.
-
公开(公告)号:US10121494B1
公开(公告)日:2018-11-06
申请号:US15474603
申请日:2017-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US12087320B1
公开(公告)日:2024-09-10
申请号:US17671194
申请日:2022-02-14
Applicant: Amazon Technologies, Inc.
Inventor: Qin Zhang , Qingming Tang , Ming Sun , Chao Wang , Steve Mark Lorusso , Andrew Thomas Bydlon , James Garnet Droppo , Viktor Rozgic , Sripal Mehta , Yang Liu
CPC classification number: G10L25/51 , G10L15/1815 , G10L15/22 , G10L15/30
Abstract: A system may be configured to detect custom acoustic events, where the system generates an acoustic event profile for the custom acoustic event based on a natural language description provided by a user and using an audio sample of the described acoustic event. For example, the user may describe the custom acoustic event as “dog bark.” The system may ask the user questions to refine the description (e.g., dog breed, dog gender, age, etc.). Using an audio sample of the refined description, the system may then determine that audio captured in the user's environment is a potential sample of the custom acoustic event. Such captured audio may be presented to the user for confirmation, and then may be used to detect future occurrences of the custom acoustic event in the user's environment.
-
公开(公告)号:US11657832B2
公开(公告)日:2023-05-23
申请号:US17022197
申请日:2020-09-16
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
IPC: G10L15/00 , G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L15/08
CPC classification number: G10L25/30 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/51 , G10L25/78 , G10L2015/088 , G10L2025/783
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
-
-
-
-
-
-
-
-