-
公开(公告)号:US11355102B1
公开(公告)日:2022-06-07
申请号:US16712539
申请日:2019-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Yuriy Mishchenko , Thibaud Senechal , Anish N. Shah , Shiv Naga Prasad Vitaladevuni
Abstract: A neural network model of a user device is trained to map different words represented in audio data to different points in an N-dimensional embedding space. When the user device determines that a mapped point corresponds to a wakeword, it causes further audio processing, such as automatic speech recognition or natural-language understanding, to be performed on the audio data. The user device may first create the wakeword by first processing audio data representing the wakeword to determine the mapped point in the embedding space.
-
公开(公告)号:US20220093093A1
公开(公告)日:2022-03-24
申请号:US17112227
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Nikko Strom , Pradeep Natarajan , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , David Chi-Wai Tang , Aaron Challenner , Xu Zhang , Krishna Anisetty , Josey Diego Sandoval , Rohit Prasad , Premkumar Natarajan
Abstract: A system can operate a speech-controlled device in a mode where the speech-controlled device determines that an utterance is directed at the speech-controlled device using image data showing the user speaking the utterance. If the user is directing the user's gaze at the speech-controlled device while speaking, the system may determine the utterance is system directed and thus may perform further speech processing based on the utterance. If the user's gaze is directed elsewhere, the system may determine the utterance is not system directed (for example directed at another user) and thus the system may not perform further speech processing based on the utterance and may take other actions, for example discarding audio data of the utterance.
-
公开(公告)号:US20210027798A1
公开(公告)日:2021-01-28
申请号:US17022197
申请日:2020-09-16
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US10796716B1
公开(公告)日:2020-10-06
申请号:US16157319
申请日:2018-10-11
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US10304440B1
公开(公告)日:2019-05-28
申请号:US15198578
申请日:2016-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Sankaran Panchapagesan , Bjorn Hoffmeister , Arindam Mandal , Aparna Khare , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Ming Sun
Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.
-
公开(公告)号:US09589560B1
公开(公告)日:2017-03-07
申请号:US14135309
申请日:2013-12-19
Applicant: Amazon Technologies, Inc.
Inventor: Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister , Rohit Prasad
IPC: G10L15/01
CPC classification number: G10L15/01 , G06K9/6277
Abstract: Features are disclosed for estimating a false rejection rate in a detection system. The false rejection rate can be estimated by fitting a model to a distribution of detection confidence scores. An estimated false rejection rate can then be computed for confidence scores that fall below a threshold. The false rejection rate and model can be verified once the detection system has been deployed by obtaining additional data with confidence scores falling below the threshold. Adjustments to the model or other operational parameters can be implemented based on the verified false rejection rate, model, or additional data.
Abstract translation: 公开了用于估计检测系统中的假拒绝率的特征。 可以通过将模型拟合到检测置信度分数的分布来估计错误拒绝率。 然后可以计算低于阈值的置信度分数的估计的错误拒绝率。 一旦检测系统被部署,可以通过获得低于阈值的置信度分数的附加数据来验证错误拒绝率和模型。 可以基于验证的假拒绝率,模型或附加数据来实现对模型或其他操作参数的调整。
-
公开(公告)号:US12039975B2
公开(公告)日:2024-07-16
申请号:US17112512
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , Angeliki Metallinou , Vincent Auvray , Minmin Shen , Josey Diego Sandoval , Rohit Prasad , Thomas Taylor , Amotz Maimon
IPC: G10L15/22 , G06F3/16 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/20 , G10L15/24
CPC classification number: G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227
Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
-
公开(公告)号:US11769496B1
公开(公告)日:2023-09-26
申请号:US16711967
申请日:2019-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Rohit Prasad , Shiv Naga Prasad Vitaladevuni , Prem Natarajan
CPC classification number: G10L15/22 , G10L15/063 , G10L15/1815 , H04L67/306 , G06F21/6245 , G10L15/07 , G10L2015/088 , G10L2015/223 , G10L2015/227
Abstract: Described are techniques for predicting when data associated with a user input is likely to be selected for deletion. The system may use a trained model to assist with such predictions. The trained model can be configured based on deletions associated with a user profile. An example process can including receiving user input data corresponding to the user profile, and processing the user input data to determine a user command. Based on characteristic data of the user command, the trained model can be used to determine that data corresponding to the user command is likely to be selected for deletion. The trained model can be iteratively updated based on additional user commands, including previously received user commands to delete user input data.
-
公开(公告)号:US20230162728A1
公开(公告)日:2023-05-25
申请号:US18070830
申请日:2022-11-29
Applicant: Amazon Technologies, Inc.
Inventor: Christin Jose , Yuriy Mishchenko , Anish N. Shah , Alex Escott , Parind Shah , Shiv Naga Prasad Vitaladevuni , Thibaud Senechal
CPC classification number: G10L15/16 , G06F17/15 , G10L15/063 , G10L2015/088
Abstract: A system and method performs wakeword detection using a feedforward neural network model. A first output of the model indicates when the wakeword appears on a right side of a first window of input audio data. A second output of the model indicates when the wakeword appears in the center of a second window of input audio data. A third output of the model indicates when the wakeword appears on a left side of a third window of input audio data. Using these outputs, the system and method determine a beginpoint and endpoint of the wakeword.
-
公开(公告)号:US11043218B1
公开(公告)日:2021-06-22
申请号:US16452964
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechal , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
-
-
-
-
-
-
-
-