-
公开(公告)号:US10147442B1
公开(公告)日:2018-12-04
申请号:US14869803
申请日:2015-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Sankaran Panchapagesan , Shiva Kumar Sundaram , Arindam Mandal
Abstract: A neural network acoustic model is trained to be robust and produce accurate output when used to process speech signals having acoustic interference. The neural network acoustic model can be trained using a source-separation process by which, in addition to producing the main acoustic model output for a given input, the neural network generates predictions of the separate speech and interference portions of the input. The parameters of the neural network can be adjusted to jointly optimize all three outputs (e.g., the main acoustic model output, the speech signal prediction, and the interference signal prediction), rather than only optimizing the main acoustic model output. Once trained, output layers for the speech and interference signal predictions can be removed from the neural network or otherwise disabled.
-
公开(公告)号:US20230410833A1
公开(公告)日:2023-12-21
申请号:US18131531
申请日:2023-04-06
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
CPC classification number: G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L2015/088
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US10679621B1
公开(公告)日:2020-06-09
申请号:US15927764
申请日:2018-03-21
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani
IPC: G10L15/22 , G10L15/187 , G10L15/26 , G10L15/30 , H04R3/00 , G10L21/0208 , G06F40/40 , H04W4/02 , G10L21/0216 , G10L15/08
Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
-
公开(公告)号:US09484030B1
公开(公告)日:2016-11-01
申请号:US14956874
申请日:2015-12-02
Applicant: Amazon Technologies, Inc.
Inventor: Michael Patrick Meaney , Shiva Kumar Sundaram
CPC classification number: G10L15/22 , G06F3/167 , G10L13/00 , G10L2015/223 , G10L2015/226 , H04R3/005 , H04R27/00 , H04R2227/005
Abstract: A system is configured to execute audio-initiated commands. The system detects audio and determines if a first sound is included in the audio. The system then processes further incoming audio to detect a second sound. If the second sound is not detected within a time threshold, the system executes a command. The command may include delivering a message, outputting audio corresponding to synthesized speech, or some other executable command.
Abstract translation: 系统被配置为执行音频发起的命令。 系统检测音频并确定音频中是否包含第一个声音。 系统然后处理进一步的输入音频以检测第二声音。 如果在时间阈值内没有检测到第二个声音,则系统执行命令。 该命令可以包括递送消息,输出与合成语音相对应的音频或一些其他可执行命令。
-
公开(公告)号:US11935525B1
公开(公告)日:2024-03-19
申请号:US16895377
申请日:2020-06-08
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani
IPC: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/00 , G10L15/08 , G10L21/0216 , H04W4/02
CPC classification number: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/005 , G10L2015/088 , G10L2015/223 , G10L2021/02166 , H04W4/025
Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
-
公开(公告)号:US11657832B2
公开(公告)日:2023-05-23
申请号:US17022197
申请日:2020-09-16
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
IPC: G10L15/00 , G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L15/08
CPC classification number: G10L25/30 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/51 , G10L25/78 , G10L2015/088 , G10L2025/783
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US09972339B1
公开(公告)日:2018-05-15
申请号:US15228617
申请日:2016-08-04
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram
IPC: G10L15/00 , G10L25/30 , G10L17/04 , G10L21/028 , G10L17/08 , G10L21/0216
CPC classification number: G10L25/30 , G01S3/80 , G06N3/0445 , G06N3/0454 , G10L25/78 , G10L2021/02166
Abstract: A neural network model, such as a deep neural network (DNN), is trained using many speech examples to perform beam selection in a microphone array-based speech processing system. The DNN is trained using many different speech examples that are labeled with position or direction information relative to a training microphone array. The DNN may then be trained to recognize a direction of incoming speech so that at runtime the trained DNN may process input audio data from a microphone array and may output to a beam selector an indicator of the desired beam that may be selected for further processing. The DNN may be configured to output a beam index and/or coordinates (or other position data) corresponding to an estimated location of the detected speech. The DNN may also be configured to output acoustic unit data corresponding to speech units (for example corresponding to phonemes, senons, etc. such as those of a detected wakeword or other word).
-
公开(公告)号:US10134421B1
公开(公告)日:2018-11-20
申请号:US15967185
申请日:2018-04-30
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram
IPC: G10L15/00 , G10L25/30 , G10L17/08 , G10L21/028 , G10L17/04 , G10L21/0216
Abstract: A neural network model, such as a deep neural network (DNN), is trained using many speech examples to perform beam selection in a microphone array-based speech processing system. The DNN is trained using many different speech examples that are labeled with position or direction information relative to a training microphone array. The DNN may then be trained to recognize a direction of incoming speech so that at runtime the trained DNN may process input audio data from a microphone array and may output to a beam selector an indicator of the desired beam that may be selected for further processing. The DNN may be configured to output a beam index and/or coordinates (or other position data) corresponding to an estimated location of the detected speech. The DNN may also be configured to output acoustic unit data corresponding to speech units (for example corresponding to phonemes, senons, etc. such as those of a detected wakeword or other word).
-
公开(公告)号:US10121494B1
公开(公告)日:2018-11-06
申请号:US15474603
申请日:2017-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US20170076720A1
公开(公告)日:2017-03-16
申请号:US14852022
申请日:2015-09-11
Applicant: Amazon Technologies, Inc.
Inventor: Ramya Gopalan , Shiva Kumar Sundaram
CPC classification number: G10L15/22 , G06F3/167 , G10L2015/223 , G10L2021/02166
Abstract: Architectures and techniques for selecting a voice-enabled device to handle audio input that is detected by multiple voice-enabled devices are described herein. In some instances, multiple voice-enabled devices may detect audio input from a user at substantially the same time, due to the voice-enabled devices being located within proximity to the user. The architectures and techniques may analyze a variety of audio signal metric values for the voice-enabled devices to designate a voice-enabled device to handle the audio input.
Abstract translation: 这里描述了用于选择支持语音的设备来处理由多个支持语音的设备检测到的音频输入的体系结构和技术。 在某些情况下,由于支持语音的设备位于与用户接近的位置,所以多个支持语音的设备可以基本上同时检测来自用户的音频输入。 架构和技术可以分析用于支持语音的设备的各种音频信号度量值,以指定支持语音的设备来处理音频输入。
-
-
-
-
-
-
-
-
-