Determining action selection policies of an execution device

发明授权

US10789810B1 Determining action selection policies of an execution device 有权

请登陆查看更多内容

专利标题： Determining action selection policies of an execution device
申请号： US16712017

申请日： 2019-12-12
公开(公告)号： US10789810B1

公开(公告)日： 2020-09-29
发明人: Hui Li , Kailiang Hu , Le Song
申请人： Alibaba Group Holding Limited
申请人地址： KY George Town, Grand Cayman
专利权人： Alibaba Group Holding Limited
当前专利权人： Alibaba Group Holding Limited
当前专利权人地址： KY George Town, Grand Cayman
代理机构： Fish & Richardson P.C.
主分类号： A63F3/00
IPC分类号： A63F3/00 ; G07F17/32

Determining action selection policies of an execution device

摘要：

Disclosed herein are methods, systems, and apparatus for generating an action selection policy (ASP) of an execution device. One method includes, in a current iteration, computing a first reward for a current state based on respective first rewards for actions in the current state and an ASP of the current state in the current iteration; computing an accumulative respective regret value of each action in the current state based on a difference between the respective first reward for the action and the first reward for the current state; computing an ASP of the current state in the next iteration; computing a second reward for the current state based on the respective first rewards for the actions and the ASP of the current state in the next iteration; and determining an ASP of the previous state in the next iteration based on the second reward for the current state.

信息查询

Espacenet

IPC分类:

A	人类生活必需
A63	运动；游戏；娱乐活动
A63F	纸牌，棋盘或轮盘赌游戏；利用小型运动物体的室内游戏；视频游戏；其他类目不包含的游戏
A63F3/00	棋盘游戏；抽彩游戏（由游戏者挪动有数字记号的棋子为特征的竞赛、通行比赛或障碍比赛入A63F9/14；使用具有二维或多维与游戏有关显示图像的电子显示器的游戏方面入A63F13/00）