- 专利标题: Determining action selection policies of an execution device
-
申请号: US16712017申请日: 2019-12-12
-
公开(公告)号: US10789810B1公开(公告)日: 2020-09-29
- 发明人: Hui Li , Kailiang Hu , Le Song
- 申请人: Alibaba Group Holding Limited
- 申请人地址: KY George Town, Grand Cayman
- 专利权人: Alibaba Group Holding Limited
- 当前专利权人: Alibaba Group Holding Limited
- 当前专利权人地址: KY George Town, Grand Cayman
- 代理机构: Fish & Richardson P.C.
- 主分类号: A63F3/00
- IPC分类号: A63F3/00 ; G07F17/32
摘要:
Disclosed herein are methods, systems, and apparatus for generating an action selection policy (ASP) of an execution device. One method includes, in a current iteration, computing a first reward for a current state based on respective first rewards for actions in the current state and an ASP of the current state in the current iteration; computing an accumulative respective regret value of each action in the current state based on a difference between the respective first reward for the action and the first reward for the current state; computing an ASP of the current state in the next iteration; computing a second reward for the current state based on the respective first rewards for the actions and the ASP of the current state in the next iteration; and determining an ASP of the previous state in the next iteration based on the second reward for the current state.
信息查询