Patent search ap:("Google LLC") AND inv:"Timothy Paul Lillicrap" Page 1

1.

发明申请
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE 有权

公开(公告)号：US20210158162A1

公开(公告)日：2021-05-27

申请号：US17103827

申请日：2020-11-24

Applicant: Google LLC

Inventor： Danijar Hafner , Mohammad Norouzi , Timothy Paul Lillicrap

IPC: G06N3/08 , G06K9/62 , G06F30/27

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network used to select an action to be performed by an agent interacting with an environment. In one aspect, a method includes: receiving a latent representation characterizing a current state of the environment; generating a trajectory of latent representations that starts with the received latent representation; for each latent representation in the trajectory: determining a predicted reward; and processing the state latent representation using a value neural network to generate a predicted state value; determining a corresponding target state value for each latent representation in the trajectory; determining, based on the target state values, an update to the current values of the policy neural network parameters; and determining an update to the current values of the value neural network parameters.

2.

发明申请
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES 有权

公开(公告)号：US20220284266A1

公开(公告)日：2022-09-08

申请号：US17704721

申请日：2022-03-25

Applicant: Google LLC

Inventor： Shixiang Gu , Timothy Paul Lillicrap , Ilya Sutskever , Sergey Vladimir Levine

IPC: G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

3.

发明授权
Reinforcement learning using advantage estimates 有权

公开(公告)号：US11288568B2

公开(公告)日：2022-03-29

申请号：US15429088

申请日：2017-02-09

Applicant: Google LLC

Inventor： Shixiang Gu , Timothy Paul Lillicrap , Ilya Sutskever , Sergey Vladimir Levine

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

Patent Agency Ranking