Value function representation method of reinforcement learning and apparatus using this

发明授权

US08175982B2 Value function representation method of reinforcement learning and apparatus using this 失效

标题翻译：强化学习的价值函数表示方法及其使用的装置

请登陆查看更多内容

专利标题： Value function representation method of reinforcement learning and apparatus using this
专利标题（中）： 强化学习的价值函数表示方法及其使用的装置
申请号： US12065558

申请日： 2006-08-18
公开(公告)号： US08175982B2

公开(公告)日： 2012-05-08
发明人: Tomoki Hamagami , Takesi Shibuya
申请人： Tomoki Hamagami , Takesi Shibuya
申请人地址： JP Yokohama
专利权人： Nat'l University Corp. Yokohama Nat'l University
当前专利权人： Nat'l University Corp. Yokohama Nat'l University
当前专利权人地址： JP Yokohama
代理机构： Westerman, Hattori, Daniels & Adrian, LLP
优先权： JP2005-254763 20050902
国际申请： PCT/JP2006/316659 WO 20060818
国际公布： WO2007/029516 WO 20070315
主分类号： G06F15/18
IPC分类号： G06F15/18 ; G06E1/00

Value function representation method of reinforcement learning and apparatus using this

摘要：

Reinforcement learning is one of the intellectual operations applied to autonomously moving robots etc. It is a system having excellent sides, for example, enabling operation in unknown environments. However, it has the basic problem called the “incomplete perception problem”. A variety of solution has been proposed, but none has been decisive. The systems also become complex. A simple and effective method of solution has been desired.A complex value function defining a state-action value by a complex number is introduced. Time series information is introduced into a phase part of the complex number value. Due to this, the time series information is introduced into the value function without using a complex algorithm, so the incomplete perception problem is effectively solved by simple loading of the method.

摘要（中）：

加固学习是应用于自主移动机器人等的智力操作之一。它是具有优异方面的系统，例如，在未知环境中运行。但是，它有一个基本的问题叫做“不完全的感知问题”。已经提出了各种解决方案，但没有一个是决定性的。系统也变得复杂。希望有一种简单有效的解决方法。介绍了通过复数定义状态动作值的复数值函数。时间序列信息被引入复数值的相位部分。由此，将时间序列信息引入到值函数中而不使用复杂的算法，因此通过简单的方法加载有效地解决了不完全的感知问题。

公开/授权文献

US20090234783A1 VALUE FUNCTION REPRESENTATION METHOD OF REINFORCEMENT LEARNING AND APPARATUS USING THIS 公开/授权日：2009-09-17

信息查询

Espacenet