Patent search ap:("Google LLC") AND inv:"Kelvin Xu" Page 1

1.

发明授权
Training policy neural networks using path consistency learning 有权

公开(公告)号：US11429844B2

公开(公告)日：2022-08-30

申请号：US16904785

申请日：2020-06-18

Applicant: Google LLC

Inventor： Ofir Nachum , Mohammad Norouzi , Dale Eric Schuurmans , Kelvin Xu

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

2.

发明申请
TRAINING POLICY NEURAL NETWORKS USING PATH CONSISTENCY LEARNING 审中-公开

公开(公告)号：US20200320372A1

公开(公告)日：2020-10-08

申请号：US16904785

申请日：2020-06-18

Applicant: Google LLC

Inventor： Ofir Nachum , Mohammad Norouzi , Dale Eric Schuurmans , Kelvin Xu

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

3.

发明授权
Training policy neural networks using path consistency learning 有权

公开(公告)号：US10733502B2

公开(公告)日：2020-08-04

申请号：US16504934

申请日：2019-07-08

Applicant: Google LLC

Inventor： Ofir Nachum , Mohammad Norouzi , Dale Eric Schuurmans , Kelvin Xu

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

4.

发明申请
TRAINING POLICY NEURAL NETWORKS USING PATH CONSISTENCY LEARNING 审中-公开

公开(公告)号：US20190332922A1

公开(公告)日：2019-10-31

申请号：US16504934

申请日：2019-07-08

Applicant: Google LLC

Inventor： Ofir Nachum , Mohammad Norouzi , Dale Eric Schuurmans , Kelvin Xu

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

Patent Agency Ranking