-
公开(公告)号:US20240345873A1
公开(公告)日:2024-10-17
申请号:US18294784
申请日:2022-08-03
发明人: Tom Schaul , Miruna Pîslar
IPC分类号: G06F9/48
CPC分类号: G06F9/4875
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled to perform a task episode by switching the control policy that is used to control the agent at one or more time steps during the task episode.
-
公开(公告)号:US20240104353A1
公开(公告)日:2024-03-28
申请号:US18274748
申请日:2022-02-08
发明人: Rémi Bertrand Francis Leblond , Jean-Baptiste Alayrac , Laurent Sifre , Miruna Pîslar , Jean-Baptiste Lespiau , Ioannis Antonoglou , Karen Simonyan , David Silver , Oriol Vinyals
IPC分类号: G06N3/0455
CPC分类号: G06N3/0455
摘要: A computer-implemented method for generating an output token sequence from an input token sequence. The method combines a look ahead tree search, such as a Monte Carlo tree search, with a sequence-to-sequence neural network system. The sequence-to-sequence neural network system has a policy output defining a next token probability distribution, and may include a value neural network providing a value output to evaluate a sequence. An initial partial output sequence is extended using the look ahead tree search guided by the policy output and, in implementations, the value output, of the sequence-to-sequence neural network system until a complete output sequence is obtained.
-