ACTOR ENSEMBLE FOR CONTINUOUS CONTROL

    公开(公告)号:US20210232922A1

    公开(公告)日:2021-07-29

    申请号:US17167842

    申请日:2021-02-04

    Abstract: A method of training a reinforcement learning agent to output actions from a continuous action space, comprising: providing an actor ensemble that includes a plurality of actor neural networks that each output a respective action from the continuous action space in response to an observed state of an environment; providing a critic neural network that approximates a state-action value function indicating an impact of an action on the environment based on a reward from the environment and the observed state of the environment; training the actor ensemble and the critic neural network to maximize a state-action value from the state-action value function over successive time steps by, in each time step: selecting from the respective actions output by the plurality of actor neural networks the action that will provide a best state-action value from the state-action value function; applying the selected action to the environment; based on an observed state of the environment of in response to the selected action, determine a gradient ascent for the plurality of actor neural networks for updating the parameters of the plurality of actor neural networks and determine a gradient descent for the critic neural network for updating the parameters of the critic neural network.

Patent Agency Ranking