TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN EXPERT EXPLORATION BEHAVIORS FROM DEMONSTRATORS

    公开(公告)号:US20210397959A1

    公开(公告)日:2021-12-23

    申请号:US17354991

    申请日:2021-06-22

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes obtaining a transition generated as a result of the reinforcement learning agent interacting with the environment, processing a bonus input using a bonus estimation neural network to generate an exploration bonus estimate that encourages the agent to explore the environment in accordance with an expert exploration strategy that would be adopted by an expert agent; generating a modified reward from the reward included in the transition and the exploration bonus estimate; and determining an update to current parameter values of the neural network to optimize a reinforcement learning objective function that maximizes returns to be received by the agent with respect to the modified reward.

    Efficient Knowledge Distillation Framework for Training Machine-Learned Models

    公开(公告)号:US20250124256A1

    公开(公告)日:2025-04-17

    申请号:US18486792

    申请日:2023-10-13

    Applicant: Google LLC

    Abstract: An example method is provided for training a machine-learned student sequence processing model, the method comprising: obtaining a respective input; obtaining, from the student machine-learned sequence processing model, a respective output corresponding to the respective input; generating a multiscale refinement objective configured to jointly distill knowledge from a teacher machine-learned sequence processing model and reinforce preferred behavior of the student machine-learned sequence processing model, wherein the multiscale refinement objective comprises: a first component based on a divergence metric characterizing, for the respective input, a comparison of a plurality of predictions of the student machine-learned sequence processing model to a plurality of predictions of the teacher machine-learned sequence processing model; and a second component based on a reinforcement learning signal associated with the respective output; and updating the machine-learned student sequence processing model based on the multiscale refinement objective.

    TRAINING REINFORCEMENT LEARNING AGENTS USING AUGMENTED TEMPORAL DIFFERENCE LEARNING

    公开(公告)号:US20210390409A1

    公开(公告)日:2021-12-16

    申请号:US17347264

    申请日:2021-06-14

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes training the neural network on one or more transitions selected from a replay memory, including: generating, using the neural network, an action selection output for the current observation; determining, based on the action selection output and the current action performed by the agent in response to the current observation, a state-action target for the current observation; determining a gradient of a temporal difference (TD) loss function with respect to parameters of the neural network, wherein the TD loss function comprises a first term that depends on the state-action target for the current observation; and adjusting current parameter values of the neural network based on the gradient.

Patent Agency Ranking