Efficient Off-Policy Credit Assignment
    1.
    发明申请

    公开(公告)号:US20200285993A1

    公开(公告)日:2020-09-10

    申请号:US16653890

    申请日:2019-10-15

    Abstract: Systems and methods are provided for efficient off-policy credit assignment (ECA) in reinforcement learning. ECA allows principled credit assignment for off-policy samples, and therefore improves sample efficiency and asymptotic performance. One aspect of ECA is to formulate the optimization of expected return as approximate inference, where policy is approximating a learned prior distribution, which leads to a principled way of utilizing off-policy samples. Other features are also provided.

    Meta-Reinforcement Learning Gradient Estimation with Variance Reduction

    公开(公告)号:US20200234113A1

    公开(公告)日:2020-07-23

    申请号:US16395083

    申请日:2019-04-25

    Inventor: Hao LIU

    Abstract: A method for deep reinforcement learning using a neural network model includes receiving a distribution including a plurality of related tasks. Parameters for the reinforcement learning neural network model is trained based on gradient estimation associated with the parameters using samples associated with the plurality of related tasks. Control variates are incorporated into the gradient estimation by automatic differentiation.

Patent Agency Ranking