Patent search ap:("Huawei Technologies Co. Page Ltd.") AND inv:"Shangtong ZHANG"

1.

发明申请
ACTOR ENSEMBLE FOR CONTINUOUS CONTROL 有权

公开(公告)号：US20210232922A1

公开(公告)日：2021-07-29

申请号：US17167842

申请日：2021-02-04

Applicant: Huawei Technologies Co., Ltd.

Inventor： Shangtong ZHANG , Hengshuai YAO , Hao CHEN

IPC: G06N3/08 , G06N3/04 , G06N20/20 , G06K9/62

Abstract: A method of training a reinforcement learning agent to output actions from a continuous action space, comprising: providing an actor ensemble that includes a plurality of actor neural networks that each output a respective action from the continuous action space in response to an observed state of an environment; providing a critic neural network that approximates a state-action value function indicating an impact of an action on the environment based on a reward from the environment and the observed state of the environment; training the actor ensemble and the critic neural network to maximize a state-action value from the state-action value function over successive time steps by, in each time step: selecting from the respective actions output by the plurality of actor neural networks the action that will provide a best state-action value from the state-action value function; applying the selected action to the environment; based on an observed state of the environment of in response to the selected action, determine a gradient ascent for the plurality of actor neural networks for updating the parameters of the plurality of actor neural networks and determine a gradient descent for the critic neural network for updating the parameters of the critic neural network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification