-
公开(公告)号:US11854538B1
公开(公告)日:2023-12-26
申请号:US16277328
申请日:2019-02-15
Applicant: Amazon Technologies, Inc.
Inventor: Viktor Rozgic , Chao Wang , Ming Sun , Srinivas Parthasarathy
CPC classification number: G10L15/1815 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16
Abstract: Described herein is a system for sentiment detection in audio data. The system processes audio frame level features of input audio data using a machine learning algorithm to classify the input audio data into a particular sentiment category. The machine learning algorithm may be a neural network trained using an encoder-decoder method. The training of the machine learning algorithm may include normalization techniques to avoid potential bias in the training data that may occur when the training data is annotated for a perceived sentiment of the speaker.
-
公开(公告)号:US11790919B2
公开(公告)日:2023-10-17
申请号:US17740910
申请日:2022-05-10
Applicant: Amazon Technologies, Inc.
Inventor: Gustavo Alfonso Aguilar Alas , Viktor Rozgic , Chao Wang
IPC: G10L15/26 , G06F40/284 , G10L15/06 , G10L15/16
CPC classification number: G10L15/26 , G06F40/284 , G10L15/063 , G10L15/16
Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
-
公开(公告)号:US11545174B2
公开(公告)日:2023-01-03
申请号:US17178844
申请日:2021-02-18
Applicant: Amazon Technologies, Inc.
Inventor: Daniel Kenneth Bone , Chao Wang , Viktor Rozgic
Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.
-
公开(公告)号:US20210249035A1
公开(公告)日:2021-08-12
申请号:US17178844
申请日:2021-02-18
Applicant: Amazon Technologies, Inc.
Inventor: Daniel Kenneth Bone , Chao Wang , Viktor Rozgic
Abstract: Described herein is a system for emotion detection in audio data using a speaker's baseline. The baseline may represent a user's speaking style in a neutral emotional state. The system is configured to compare the user's baseline with input audio representing speech from the user to determine a emotion of the user. The system may store multiple baselines for the user, each associated with a different context (e.g., environment, activity, etc.), and select one of the baselines to compare with the input audio based on the contextual situation.
-
公开(公告)号:US11869535B1
公开(公告)日:2024-01-09
申请号:US16711883
申请日:2019-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Mohammad Taha Bahadori , Viktor Rozgic , Alexander Jonathan Pinkus , Chao Wang , David Heckerman
CPC classification number: G10L25/63 , G06N3/044 , G06N3/08 , G10L15/063 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L2015/223
Abstract: Described is a system and method that determines character sequences from speech, without determining the words of the speech, and processes the character sequences to determine sentiment data indicative of emotional state of a user that output the speech. The emotional state may then be presented or provided as an output to the user.
-
公开(公告)号:US20230027828A1
公开(公告)日:2023-01-26
申请号:US17740910
申请日:2022-05-10
Applicant: Amazon Technologies, Inc.
Inventor: Gustavo Alfonso Aguilar Alas , Viktor Rozgic , Chao Wang
IPC: G10L15/26 , G06F40/284 , G10L15/06 , G10L15/16
Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
-
公开(公告)号:US11335347B2
公开(公告)日:2022-05-17
申请号:US16429689
申请日:2019-06-03
Applicant: Amazon Technologies, Inc.
Inventor: Gustavo Alfonso Aguilar Alas , Viktor Rozgic , Chao Wang
IPC: G10L15/26 , G06F40/284 , G10L15/06 , G10L15/16
Abstract: Described herein is a system for sentiment detection in audio data. The system is trained using acoustic information and lexical information to determine a sentiment corresponding to an utterance. In some cases when lexical information is not available, the system (trained on acoustic and lexical information) is configured to determine a sentiment using only acoustic information.
-
公开(公告)号:US12039998B1
公开(公告)日:2024-07-16
申请号:US17665129
申请日:2022-02-04
Applicant: Amazon Technologies, Inc.
Inventor: Chieh-Chi Kao , Qingming Tang , Ming Sun , Viktor Rozgic , Spyridon Matsoukas , Chao Wang
Abstract: An acoustic event detection system may employ self-supervised federated learning to update encoder and/or classifier machine learning models. In an example operation, an encoder may be pre-trained to extract audio feature data from an audio signal. A decoder may be pre-trained to predict a subsequent portion of audio data (e.g., a subsequent frame of audio data represented by log filterbank energies). The encoder and decoder may be trained using self-supervised learning to improve the decoder's predictions and, by extension, the quality of the audio feature data generated by the encoder. The system may apply federated learning to share encoder updates across user devices. The system may fine-tune the classifier to improve inferences based on the improved audio feature data. The system may distribute classifier updates to the user device(s) to update the on-device classifier.
-
公开(公告)号:US11069352B1
公开(公告)日:2021-07-20
申请号:US16278440
申请日:2019-02-18
Applicant: Amazon Technologies, Inc.
Inventor: Qingming Tang , Ming Sun , Chieh-Chi Kao , Chao Wang , Viktor Rozgic
Abstract: Described herein is a system for media presence detection in audio. The system analyzes audio data to recognize whether a given audio segment contains sounds from a media source as a way of differentiating recorded media source sounds from other live sounds. In exemplary embodiments, the system includes a hierarchical model architecture for processing audio data segments, where individual audio data segments are processed by a trained machine learning model operating locally, and another trained machine learning model provides historical and contextual information to determine a score indicating the likelihood that the audio data segment contains sounds from a media source.
-
公开(公告)号:US12087320B1
公开(公告)日:2024-09-10
申请号:US17671194
申请日:2022-02-14
Applicant: Amazon Technologies, Inc.
Inventor: Qin Zhang , Qingming Tang , Ming Sun , Chao Wang , Steve Mark Lorusso , Andrew Thomas Bydlon , James Garnet Droppo , Viktor Rozgic , Sripal Mehta , Yang Liu
CPC classification number: G10L25/51 , G10L15/1815 , G10L15/22 , G10L15/30
Abstract: A system may be configured to detect custom acoustic events, where the system generates an acoustic event profile for the custom acoustic event based on a natural language description provided by a user and using an audio sample of the described acoustic event. For example, the user may describe the custom acoustic event as “dog bark.” The system may ask the user questions to refine the description (e.g., dog breed, dog gender, age, etc.). Using an audio sample of the refined description, the system may then determine that audio captured in the user's environment is a potential sample of the custom acoustic event. Such captured audio may be presented to the user for confirmation, and then may be used to detect future occurrences of the custom acoustic event in the user's environment.
-
-
-
-
-
-
-
-
-