-
公开(公告)号:US20200285993A1
公开(公告)日:2020-09-10
申请号:US16653890
申请日:2019-10-15
Applicant: salesforce.com, inc.
Inventor: Hao LIU , Richard SOCHER , Caiming XIONG
Abstract: Systems and methods are provided for efficient off-policy credit assignment (ECA) in reinforcement learning. ECA allows principled credit assignment for off-policy samples, and therefore improves sample efficiency and asymptotic performance. One aspect of ECA is to formulate the optimization of expected return as approximate inference, where policy is approximating a learned prior distribution, which leads to a principled way of utilizing off-policy samples. Other features are also provided.
-
公开(公告)号:US20200234113A1
公开(公告)日:2020-07-23
申请号:US16395083
申请日:2019-04-25
Applicant: salesforce.com, inc.
Inventor: Hao LIU
IPC: G06N3/08
Abstract: A method for deep reinforcement learning using a neural network model includes receiving a distribution including a plurality of related tasks. Parameters for the reinforcement learning neural network model is trained based on gradient estimation associated with the parameters using samples associated with the plurality of related tasks. Control variates are incorporated into the gradient estimation by automatic differentiation.
-