-
公开(公告)号:US11900948B1
公开(公告)日:2024-02-13
申请号:US17571127
申请日:2022-01-07
Applicant: Amazon Technologies, Inc.
Inventor: Hugh Evan Secker-Walker , Baiyang Liu , Frederick Victor Weber
IPC: G10L15/22 , G10L15/00 , G10L17/00 , G10L17/06 , G10L17/12 , G10L17/02 , G10L17/16 , G10L15/18 , G10L17/22 , G10L15/20 , G10L15/26 , G10L15/02 , G10L15/08
CPC classification number: G10L17/06 , G10L15/18 , G10L17/02 , G10L17/12 , G10L17/16 , G10L17/22 , G10L15/20 , G10L15/26 , G10L2015/025 , G10L2015/088
Abstract: Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
-
公开(公告)号:US10970774B1
公开(公告)日:2021-04-06
申请号:US14492638
申请日:2014-09-22
Applicant: Amazon Technologies, Inc.
Inventor: Alborz Geramifard , Hugh Evan Secker-Walker
Abstract: Provided are systems and methods for receiving a plurality of item submissions from a plurality of mobile user devices (each item submission of the plurality of item submissions including: item identifier data indicative of an item; and item location data indicative of a location of the item), determining a determined location for the item (using the respective item location data for each of the plurality of item submissions), and storing the determined location for the item in an item location database. The determined location for the item is stored in association with an item identifier corresponding to the item, and the item location database stores determined locations for a plurality of items.
-
公开(公告)号:US20200349957A1
公开(公告)日:2020-11-05
申请号:US15929795
申请日:2020-05-21
Applicant: Amazon Technologies, Inc.
Inventor: Hugh Evan Secker-Walker , Baiyang Liu , Frederick Victor Weber
Abstract: Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
-
公开(公告)号:US20200043499A1
公开(公告)日:2020-02-06
申请号:US16443160
申请日:2019-06-17
Applicant: Amazon Technologies, Inc.
Inventor: Kenneth John Basye , Hugh Evan Secker-Walker , Tony David , Reinhard Kneser , Jeffrey Penrod Adams , Stan Weidner Salvador , Mahesh Krishnamoorthy
IPC: G10L15/28
Abstract: Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated. The audio input may then be transmitted via the network interface module to a remote computing device, such as a speech recognition server. Alternately, the computing device may be provided with a speech recognition engine configured to process the audio input for on-device speech recognition.
-
公开(公告)号:US10325598B2
公开(公告)日:2019-06-18
申请号:US15645918
申请日:2017-07-10
Applicant: Amazon Technologies, Inc.
Inventor: Kenneth John Basye , Hugh Evan Secker-Walker , Tony David , Reinhard Kneser , Jeffrey Penrod Adams , Stan Weidner Salvador , Mahesh Krishnamoorthy
Abstract: Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated. The audio input may then be transmitted via the network interface module to a remote computing device, such as a speech recognition server. Alternately, the computing device may be provided with a speech recognition engine configured to process the audio input for on-device speech recognition.
-
公开(公告)号:US10152973B2
公开(公告)日:2018-12-11
申请号:US14942551
申请日:2015-11-16
Applicant: Amazon Technologies, Inc.
Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.
-
公开(公告)号:US09922650B1
公开(公告)日:2018-03-20
申请号:US14137567
申请日:2013-12-20
Applicant: Amazon Technologies, Inc.
Inventor: Hugh Evan Secker-Walker , Aaron Lee Mathers Challenner , Ariya Rastrow
CPC classification number: G10L15/083 , G10L15/1822
Abstract: Features are disclosed for generating intent-specific results in an automatic speech recognition system. The results can be generated by utilizing a decoding graph containing tags that identify portions of the graph corresponding to a given intent. The tags can also identify high-information content slots and low-information carrier phrases for a given intent. The automatic speech recognition system may utilize these tags to provide a semantic representation based on a plurality of different tokens for the content slot portions and low information for the carrier portions. A user can be presented with a user interface containing top intent results with corresponding intent-specific top content slot values.
-
公开(公告)号:US09818407B1
公开(公告)日:2017-11-14
申请号:US13761812
申请日:2013-02-07
Applicant: Amazon Technologies, Inc.
Inventor: Hugh Evan Secker-Walker , Kenneth John Basye , Nikko Strom , Ryan Paul Thomas
CPC classification number: G10L25/78 , G10L15/04 , G10L15/142 , G10L15/30 , G10L15/32 , G10L25/18 , G10L25/24 , G10L25/87
Abstract: An efficient audio streaming method and apparatus includes a client process implemented on a client or local device and a server process implemented on a remote server or server(s). The client process and server process each have speech recognition components and communicate over a network, and together efficiently manage the detection of speech in an audio signal streamed by the local device to the server for speech recognition and potentially further processing at the server. The client process monitors audio input and in a first detection stage, implements endpointing on the local device to determine when speech is detected. The client process may further determine if a “wakeword” is detected, and then the client process opens a connection and begins streaming audio to the server process via the network. The server process receives the speech audio stream and monitors the audio, implementing endpointing in the server process, to determine when to tell the client process to close the connection and stop streaming audio. The client process continues streaming audio to the server until the server process determines disconnect criteria have been met and tells the client process to stop streaming audio.
-
公开(公告)号:US11222639B2
公开(公告)日:2022-01-11
申请号:US15929795
申请日:2020-05-21
Applicant: Amazon Technologies, Inc.
Inventor: Hugh Evan Secker-Walker , Baiyang Liu , Frederick Victor Weber
IPC: G10L15/22 , G10L15/26 , G10L15/30 , G10L17/06 , G10L17/12 , G10L17/02 , G10L17/16 , G10L15/18 , G10L17/22 , G10L15/20 , G10L15/02 , G10L15/08
Abstract: Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
-
公开(公告)号:US10665245B2
公开(公告)日:2020-05-26
申请号:US16448788
申请日:2019-06-21
Applicant: Amazon Technologies, Inc.
Inventor: Hugh Evan Secker-Walker , Baiyang Liu , Frederick Victor Weber
IPC: G10L15/22 , G10L15/26 , G10L15/30 , G10L17/06 , G10L17/12 , G10L17/02 , G10L17/16 , G10L15/18 , G10L17/22 , G10L15/20 , G10L15/02 , G10L15/08
Abstract: Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
-
-
-
-
-
-
-
-
-