-
公开(公告)号:US20240428797A1
公开(公告)日:2024-12-26
申请号:US18823198
申请日:2024-09-03
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
公开(公告)号:US11869490B1
公开(公告)日:2024-01-09
申请号:US16993482
申请日:2020-08-14
Applicant: Amazon Technologies, Inc.
Inventor: Rahul Gupta , Jwala Dhamala , Melanie C B Gens , Sachin Midha , Jennifer Yuen , Dewan Muhammed Ibtesham , Wael Hamza , Xinhong Zhang , Md Humayun Arafat
IPC: G10L15/183 , G10L15/06 , G06N3/08 , G06N20/00
CPC classification number: G10L15/183 , G06N3/08 , G06N20/00 , G10L15/063
Abstract: Techniques for tuning parameters for machine learning models are described. Different values for a parameter are tested to determine the value that results in an optimized model. A parameter value may be selected for testing using a search algorithm based on how the model performs with respect to other values for the parameter. Different values may be tested until a stopping criterion (such as time for testing, number of trials, amount of enhancement in performance, etc.) is met. In some embodiments, the techniques may be used to determine parameter values for natural language processing models.
-
公开(公告)号:US20230317066A1
公开(公告)日:2023-10-05
申请号:US17690609
申请日:2022-03-09
Applicant: Amazon Technologies, Inc.
Inventor: Jonathan Jakob Hueser , Fabian Triefenbach , Chandana Satya Prakash , Jin Cao , Wael Hamza , Mariusz Momotko
IPC: G10L15/18 , G10L15/22 , G10L15/06 , G06F40/295 , G06F40/30
CPC classification number: G10L15/1815 , G06F40/295 , G06F40/30 , G10L15/063 , G10L15/22 , G10L15/30
Abstract: Techniques for using a shared encoder and multiple different decoders for natural language understanding (NLU) tasks are described. The individual decoders are configured to perform different tasks using the output from one shared encoder. The decoders can process with respect to different domains and different languages. Using the shared encoder can reduce computation time during runtime. Using the shared encoder can reduce training costs (e.g., time and resources) when the system is updated to incorporate additional intents and entities. The system employs an attention mechanism to extract encoded representation data that can be used by the different decoders for its specific task.
-
公开(公告)号:US20230368796A1
公开(公告)日:2023-11-16
申请号:US18324440
申请日:2023-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
CPC classification number: G10L15/26 , G10L15/1822
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
公开(公告)号:US12266355B2
公开(公告)日:2025-04-01
申请号:US17690609
申请日:2022-03-09
Applicant: Amazon Technologies, Inc.
Inventor: Jonathan Jakob Hueser , Fabian Triefenbach , Chandana Satya Prakash , Jin Cao , Wael Hamza , Mariusz Momotko
IPC: G06F40/30 , G06F40/295 , G10L15/06 , G10L15/18 , G10L15/22 , G06F40/279 , G10L15/08 , G10L15/30
Abstract: Techniques for using a shared encoder and multiple different decoders for natural language understanding (NLU) tasks are described. The individual decoders are configured to perform different tasks using the output from one shared encoder. The decoders can process with respect to different domains and different languages. Using the shared encoder can reduce computation time during runtime. Using the shared encoder can reduce training costs (e.g., time and resources) when the system is updated to incorporate additional intents and entities. The system employs an attention mechanism to extract encoded representation data that can be used by the different decoders for its specific task.
-
公开(公告)号:US12087305B2
公开(公告)日:2024-09-10
申请号:US18324440
申请日:2023-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
CPC classification number: G10L15/26 , G10L15/1822
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
公开(公告)号:US12045288B1
公开(公告)日:2024-07-23
申请号:US17031062
申请日:2020-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Ahmet Emre Barut , Chengwei Su , Weitong Ruan , Wael Hamza
IPC: G06F16/30 , G06F16/532 , G06F16/583 , G06F16/9032 , G06V20/20 , G06N20/00
CPC classification number: G06F16/90332 , G06F16/532 , G06F16/583 , G06V20/20 , G06N20/00
Abstract: Devices and techniques are generally described for selection of objects in image data using natural language input. In various examples, first image data representing at least a first object and first natural language data may be received. In some examples, first embedding data representing the first natural language data may be generated. Second embedding data representing the first image data may be generated. Relative location data indicating a location of the first object in the first image data relative to at least one other object may be generated. The first embedding data, the second embedding data, and the relative location data may be input into a multi-modal transformer model. The multi-modal transformer model may determine that the first natural language data relates to the first object.
-
公开(公告)号:US11682400B1
公开(公告)日:2023-06-20
申请号:US17106600
申请日:2020-11-30
Applicant: Amazon Technologies, Inc.
Inventor: Beiye Liu , Wael Hamza , Liwei Cai , Konstantine Arkoudas , Chengwei Su , Subendhu Rongali
CPC classification number: G10L15/26 , G10L15/1822
Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.
-
-
-
-
-
-
-