-
公开(公告)号:US11580968B1
公开(公告)日:2023-02-14
申请号:US16455165
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Arshit Gupta , Peng Zhang , Rashmi Gangadharaiah , Garima Lalwani , Roger Scott Jenke , Hassan Sawaf , Mona Diab , Katrin Kirchhoff , Adel A. Youssef , Kalpesh N. Sutaria
Abstract: Techniques are described for a contextual natural language understanding (cNLU) framework that is able to incorporate contextual signals of variable history length to perform joint intent classification (IC) and slot labeling (SL) tasks. A user utterance provided by a user within a multi-turn chat dialog between the user and a conversational agent is received. The user utterance and contextual information associated with one or more previous turns of the multi-turn chat dialog is provided to a machine learning (ML) model. An intent classification and one or more slot labels for the user utterance are then obtained from the ML model. The cNLU framework described herein thus uses, in addition to a current utterance itself, various contextual signals as input to a model to generate IC and SL predictions for each utterance of a multi-turn chat dialog.
-
公开(公告)号:US11545134B1
公开(公告)日:2023-01-03
申请号:US16709792
申请日:2019-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Marcello Federico , Robert Enyedi , Yaser Al-Onaizan , Roberto Barra-Chicote , Andrew Paul Breen , Ritwik Giri , Mehmet Umut Isik , Arvindh Krishnaswamy , Hassan Sawaf
IPC: G10L13/08 , G10L15/22 , G11B20/10 , G06F3/16 , G10L13/10 , G06F40/47 , G10L25/90 , G10L15/06 , G10L13/00 , G10L15/26 , G06V40/16
Abstract: Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.
-