-
公开(公告)号:US20210074279A1
公开(公告)日:2021-03-11
申请号:US16952413
申请日:2020-11-19
Applicant: Google LLC
Inventor: Abhinav Rastogi , Larry Paul Heck , Dilek Hakkani-Tur
Abstract: Determining a dialog state of an electronic dialog that includes an automated assistant and at least one user, and performing action(s) based on the determined dialog state. The dialog state can be represented as one or more slots and, for each of the slots, one or more candidate values for the slot and a corresponding score (e.g., a probability) for each of the candidate values. Candidate values for a slot can be determined based on language processing of user utterance(s) and/or system utterance(s) during the dialog. In generating scores for candidate value(s) of a given slot at a given turn of an electronic dialog, various features are determined based on processing of the user utterance and the system utterance using a memory network. The various generated features can be processed using a scoring model to generate scores for candidate value(s) of the given slot at the given turn.
-
公开(公告)号:US20210217408A1
公开(公告)日:2021-07-15
申请号:US17273555
申请日:2019-09-04
Applicant: Google LLC
Inventor: Dilek Hakkani-Tur , Abhinav Kumar Rastogi , Raghav Gupta
IPC: G10L15/18 , G06F40/117 , G06F40/35 , G10L15/22 , G10L15/16 , G06F40/284 , G06N3/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for dialogue systems. A transcription of a user utterance is obtained. The transcription of the utterance is tokenized to identify multiple tokens for the utterance. Token-level utterance encodings corresponding to different tokens of the transcription are generated. A system action encoding from data indicating system actions previously performed by the dialogue system are generated. A dialogue context vector based on the utterance encoding and the system action encoding are generated. The token-level utterance encodings, the system action encoding, and the dialogue context vector are processed using a slot tagger to produce token-level output vectors. A limited set of candidate token classifications for the tokens of the user utterance are determined based on the token-level utterance encodings. A response for output is provided in response to the user utterance.
-
公开(公告)号:US12118052B2
公开(公告)日:2024-10-15
申请号:US18234766
申请日:2023-08-16
Applicant: GOOGLE LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06F16/953 , G06N3/04
CPC classification number: G06F16/954 , G06F16/953 , G06N3/04
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
公开(公告)号:US11941504B2
公开(公告)日:2024-03-26
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: G06N3/008 , B25J9/16 , B25J13/08 , G05B13/02 , G05D1/00 , G05D1/02 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/262 , G10L15/16 , G10L15/18 , G10L15/22 , G10L25/78
CPC classification number: G06N3/008 , B25J9/161 , B25J9/162 , B25J9/163 , B25J9/1697 , B25J13/08 , G05B13/027 , G05D1/0221 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/274 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L25/78 , G10L2015/223
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US20230419960A1
公开(公告)日:2023-12-28
申请号:US18367785
申请日:2023-09-13
Applicant: GOOGLE LLC
Inventor: Abhinav Rastogi , Larry Paul Heck , Dilek Hakkani-Tur
CPC classification number: G10L15/197 , G06N3/08 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L15/30 , G10L15/1822 , G10L2015/223 , G06N3/044
Abstract: Determining a dialog state of an electronic dialog that includes an automated assistant and at least one user, and performing action(s) based on the determined dialog state. The dialog state can be represented as one or more slots and, for each of the slots, one or more candidate values for the slot and a corresponding score (e.g., a probability) for each of the candidate values. Candidate values for a slot can be determined based on language processing of user utterance(s) and/or system utterance(s) during the dialog. In generating scores for candidate value(s) of a given slot at a given turn of an electronic dialog, various features are determined based on processing of the user utterance and the system utterance using a memory network. The various generated features can be processed using a scoring model to generate scores for candidate value(s) of the given slot at the given turn.
-
公开(公告)号:US20230394102A1
公开(公告)日:2023-12-07
申请号:US18234766
申请日:2023-08-16
Applicant: GOOGLE LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06F16/953 , G06N3/04
CPC classification number: G06F16/954 , G06F16/953 , G06N3/04
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
公开(公告)号:US11790899B2
公开(公告)日:2023-10-17
申请号:US16952413
申请日:2020-11-19
Applicant: Google LLC
Inventor: Abhinav Rastogi , Larry Paul Heck , Dilek Hakkani-Tur
CPC classification number: G10L15/197 , G06N3/08 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L15/30 , G06N3/044 , G10L15/1822 , G10L2015/223
Abstract: Determining a dialog state of an electronic dialog that includes an automated assistant and at least one user, and performing action(s) based on the determined dialog state. The dialog state can be represented as one or more slots and, for each of the slots, one or more candidate values for the slot and a corresponding score (e.g., a probability) for each of the candidate values. Candidate values for a slot can be determined based on language processing of user utterance(s) and/or system utterance(s) during the dialog. In generating scores for candidate value(s) of a given slot at a given turn of an electronic dialog, various features are determined based on processing of the user utterance and the system utterance using a memory network. The various generated features can be processed using a scoring model to generate scores for candidate value(s) of the given slot at the given turn.
-
公开(公告)号:US20210086353A1
公开(公告)日:2021-03-25
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: B25J9/16 , G10L25/78 , G10L15/22 , G10L15/18 , G06K9/00 , G06K9/62 , G10L15/16 , G06T7/593 , G06K9/72 , B25J13/08 , G05D1/02 , G05B13/02 , G06N3/04
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US10424302B2
公开(公告)日:2019-09-24
申请号:US15782333
申请日:2017-10-12
Applicant: Google LLC
Inventor: Pararth Shah , Larry Paul Heck , Dilek Hakkani-Tur
Abstract: Techniques are described related to turn-based reinforcement learning for dialog management. In various implementations, dialog states and corresponding responsive actions generated during a multi-turn human-to-computer dialog session may be obtained. A plurality of turn-level training instances may be generated, each including: a given dialog state of the plurality of dialog states at an outset of a given turn of the human-to-computer dialog session; and a given responsive action that was selected based on the given dialog state. One or more of the turn-level training instances may further include a turn-level feedback value that reflects on the given responsive action selected during the given turn. A reward value may be generated based on an outcome of the human-to-computer dialog session. The dialog management policy model may be trained based on turn-level feedback values of the turn-level training instance(s) and the reward value.
-
公开(公告)号:US20250077603A1
公开(公告)日:2025-03-06
申请号:US18952242
申请日:2024-11-19
Applicant: GOOGLE LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06F16/953 , G06N3/04
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
-
-
-
-
-
-
-
-