Value function representation method of reinforcement learning and apparatus using this
    1.
    发明授权
    Value function representation method of reinforcement learning and apparatus using this 失效
    强化学习的价值函数表示方法及其使用的装置

    公开(公告)号:US08175982B2

    公开(公告)日:2012-05-08

    申请号:US12065558

    申请日:2006-08-18

    IPC分类号: G06F15/18 G06E1/00

    CPC分类号: G06N99/005

    摘要: Reinforcement learning is one of the intellectual operations applied to autonomously moving robots etc. It is a system having excellent sides, for example, enabling operation in unknown environments. However, it has the basic problem called the “incomplete perception problem”. A variety of solution has been proposed, but none has been decisive. The systems also become complex. A simple and effective method of solution has been desired.A complex value function defining a state-action value by a complex number is introduced. Time series information is introduced into a phase part of the complex number value. Due to this, the time series information is introduced into the value function without using a complex algorithm, so the incomplete perception problem is effectively solved by simple loading of the method.

    摘要翻译: 加固学习是应用于自主移动机器人等的智力操作之一。它是具有优异方面的系统,例如,在未知环境中运行。 但是,它有一个基本的问题叫做“不完全的感知问题”。 已经提出了各种解决方案,但没有一个是决定性的。 系统也变得复杂。 希望有一种简单有效的解决方法。 介绍了通过复数定义状态动作值的复数值函数。 时间序列信息被引入复数值的相位部分。 由此,将时间序列信息引入到值函数中而不使用复杂的算法,因此通过简单的方法加载有效地解决了不完全的感知问题。

    VALUE FUNCTION REPRESENTATION METHOD OF REINFORCEMENT LEARNING AND APPARATUS USING THIS
    2.
    发明申请
    VALUE FUNCTION REPRESENTATION METHOD OF REINFORCEMENT LEARNING AND APPARATUS USING THIS 失效
    加强学习和设备的价值功能表征方法

    公开(公告)号:US20090234783A1

    公开(公告)日:2009-09-17

    申请号:US12065558

    申请日:2006-08-18

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: Reinforcement learning is one of the intellectual operations applied to autonomously moving robots etc. It is a system having excellent sides, for example, enabling operation in unknown environments. However, it has the basic problem called the “incomplete perception problem”. A variety of solution has been proposed, but none has been decisive. The systems also become complex. A simple and effective method of solution has been desired.A complex value function defining a state-action value by a complex number is introduced. Time series information is introduced into a phase part of the complex number value. Due to this, the time series information is introduced into the value function without using a complex algorithm, so the incomplete perception problem is effectively solved by simple loading of the method.

    摘要翻译: 加固学习是应用于自主移动机器人等的智力操作之一。它是具有优异方面的系统,例如,在未知环境中运行。 但是,它有一个基本的问题叫做“不完全的感知问题”。 已经提出了各种解决方案,但没有一个是决定性的。 系统也变得复杂。 希望有一种简单有效的解决方法。 介绍了通过复数定义状态动作值的复数值函数。 时间序列信息被引入复数值的相位部分。 由此,将时间序列信息引入到值函数中而不使用复杂的算法,因此通过简单的方法加载有效地解决了不完全的感知问题。