-
公开(公告)号:US11842727B2
公开(公告)日:2023-12-12
申请号:US17659612
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Angeliki Metallinou , Rahul Goel , Vishal Ishwar
IPC: G10L15/16 , G10L15/183 , G10L15/14 , G10L15/197 , G06F3/16 , G10L15/02 , G10L15/26
CPC classification number: G10L15/16 , G06F3/167 , G10L15/02 , G10L15/144 , G10L15/197 , G10L15/26 , G10L2015/025
Abstract: Multi-modal natural language processing systems are provided. Some systems are context-aware systems that use multi-modal data to improve the accuracy of natural language understanding as it is applied to spoken language input. Machine learning architectures are provided that jointly model spoken language input (“utterances”) and information displayed on a visual display (“on-screen information”). Such machine learning architectures can improve upon, and solve problems inherent in, existing spoken language understanding systems that operate in multi-modal contexts.
-
公开(公告)号:US11908468B2
公开(公告)日:2024-02-20
申请号:US17112520
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen
IPC: G10L25/78 , G10L15/22 , G10L15/24 , G10L15/08 , G10L15/06 , G06V40/20 , G06F3/16 , G10L13/08 , G10L15/20 , G06V40/10 , G06V10/40 , G10L15/02 , G06F18/24
CPC classification number: G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227
Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
-
公开(公告)号:US10304444B2
公开(公告)日:2019-05-28
申请号:US15196540
申请日:2016-06-29
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Lambert Mathias , Thomas Kollar , Arindam Mandal , Angeliki Metallinou
IPC: G06F17/20 , G10L15/22 , G10L15/26 , G10L15/02 , G10L15/18 , G10L15/14 , G06F16/35 , G06F16/332 , G06F17/27
Abstract: A system capable of performing natural language understanding (NLU) without the concept of a domain that influences NLU results. The present system uses a hierarchical organizations of intents/commands and entity types, and trained models associated with those hierarchies, so that commands and entity types may be determined for incoming text queries without necessarily determining a domain for the incoming text. The system thus operates in a domain agnostic manner, in a departure from multi-domain architecture NLU processing where a system determines NLU results for multiple domains simultaneously and then ranks them to determine which to select as the result.
-
公开(公告)号:US12039975B2
公开(公告)日:2024-07-16
申请号:US17112512
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , Angeliki Metallinou , Vincent Auvray , Minmin Shen , Josey Diego Sandoval , Rohit Prasad , Thomas Taylor , Amotz Maimon
IPC: G10L15/22 , G06F3/16 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/20 , G10L15/24
CPC classification number: G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227
Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
-
公开(公告)号:US20200251098A1
公开(公告)日:2020-08-06
申请号:US16723762
申请日:2019-12-20
Applicant: Amazon Technologies, Inc.
Inventor: Angeliki Metallinou , Rahul Goel , Vishal Ishwar
Abstract: Multi-modal natural language processing systems are provided. Some systems are context-aware systems that use multi-modal data to improve the accuracy of natural language understanding as it is applied to spoken language input. Machine learning architectures are provided that jointly model spoken language input (“utterances”) and information displayed on a visual display (“on-screen information”). Such machine learning architectures can improve upon, and solve problems inherent in, existing spoken language understanding systems that operate in multi-modal contexts.
-
公开(公告)号:US10515625B1
公开(公告)日:2019-12-24
申请号:US15828174
申请日:2017-11-30
Applicant: Amazon Technologies, Inc.
Inventor: Angeliki Metallinou , Rahul Goel , Vishal Ishwar
Abstract: Multi-modal natural language processing systems are provided. Some systems are context-aware systems that use multi-modal data to improve the accuracy of natural language understanding as it is applied to spoken language input. Machine learning architectures are provided that jointly model spoken language input (“utterances”) and information displayed on a visual display (“on-screen information”). Such machine learning architectures can improve upon, and solve problems inherent in, existing spoken language understanding systems that operate in multi-modal contexts.
-
公开(公告)号:US20250006196A1
公开(公告)日:2025-01-02
申请号:US18345455
申请日:2023-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Hann Wang , Angeliki Metallinou , Melanie C B Gens , Arijit Biswas , Ying Shi
Abstract: Techniques for generating a prompt for a language model to determine an action responsive to a user input, are described. In some embodiments, the system receives a user input, determines one or more application programming interfaces (APIs) configured to perform actions that are relevant to the user input and exemplars representing examples of using the APIs with respect to user inputs similar to the current user input. The system further determines device states of devices that are determined to be related to the user input and also determines other contextual information (e.g., weather information, time of day, geographic location, etc.). The system generates a prompt including the user input, the APIs, the exemplars, the device states, and the other contextual information. A language model processes the prompt to determine an action responsive to the user input and the system causes performance of the action.
-
公开(公告)号:US20220246139A1
公开(公告)日:2022-08-04
申请号:US17659612
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Angeliki Metallinou , Rahul Goel , Vishal Ishwar
Abstract: Multi-modal natural language processing systems are provided. Some systems are context-aware systems that use multi-modal data to improve the accuracy of natural language understanding as it is applied to spoken language input. Machine learning architectures are provided that jointly model spoken language input (“utterances”) and information displayed on a visual display (“on-screen information”). Such machine learning architectures can improve upon, and solve problems inherent in, existing spoken language understanding systems that operate in multi-modal contexts.
-
公开(公告)号:US20220093094A1
公开(公告)日:2022-03-24
申请号:US17112512
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , Angeliki Metallinou , Vincent Auvray , Minmin Shen , Josey Diego Sandoval , Rohit Prasad , Thomas Taylor , Amotz Maimon
Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
-
公开(公告)号:US20240153499A1
公开(公告)日:2024-05-09
申请号:US18532969
申请日:2023-12-07
Applicant: Amazon Technologies, Inc.
Inventor: Angeliki Metallinou , Rahul Goel , Vishal Ishwar
CPC classification number: G10L15/16 , G06F3/167 , G10L15/02 , G10L15/144 , G10L15/197 , G10L15/26 , G10L2015/025
Abstract: Multi-modal natural language processing systems are provided. Some systems are context-aware systems that use multi-modal data to improve the accuracy of natural language understanding as it is applied to spoken language input. Machine learning architectures are provided that jointly model spoken language input (“utterances”) and information displayed on a visual display (“on-screen information”). Such machine learning architectures can improve upon, and solve problems inherent in, existing spoken language understanding systems that operate in multi-modal contexts.
-
-
-
-
-
-
-
-
-