Patent search ap:("Google LLC") AND inv:"Matthieu Florent Geist" Page 1

1.

发明申请
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN EXPERT EXPLORATION BEHAVIORS FROM DEMONSTRATORS 有权

公开(公告)号：US20210397959A1

公开(公告)日：2021-12-23

申请号：US17354991

申请日：2021-06-22

Applicant: Google LLC

Inventor： Olivier Claude Pietquin , Léonard Hussenot Desenonges , Robert Dadashi-Tazehozi , Matthieu Florent Geist

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes obtaining a transition generated as a result of the reinforcement learning agent interacting with the environment, processing a bonus input using a bonus estimation neural network to generate an exploration bonus estimate that encourages the agent to explore the environment in accordance with an expert exploration strategy that would be adopted by an expert agent; generating a modified reward from the reward included in the transition and the exploration bonus estimate; and determining an update to current parameter values of the neural network to optimize a reinforcement learning objective function that maximizes returns to be received by the agent with respect to the modified reward.

2.

发明申请
Efficient Knowledge Distillation Framework for Training Machine-Learned Models 有权

公开(公告)号：US20250124256A1

公开(公告)日：2025-04-17

申请号：US18486792

申请日：2023-10-13

Applicant: Google LLC

Inventor： Rishabh Agarwal , Nino Jean Vieillard , Matthieu Florent Geist , Olivier Frédéric Bachem

IPC: G06N3/0455 , G06N3/092

Abstract: An example method is provided for training a machine-learned student sequence processing model, the method comprising: obtaining a respective input; obtaining, from the student machine-learned sequence processing model, a respective output corresponding to the respective input; generating a multiscale refinement objective configured to jointly distill knowledge from a teacher machine-learned sequence processing model and reinforce preferred behavior of the student machine-learned sequence processing model, wherein the multiscale refinement objective comprises: a first component based on a divergence metric characterizing, for the respective input, a comparison of a plurality of predictions of the student machine-learned sequence processing model to a plurality of predictions of the teacher machine-learned sequence processing model; and a second component based on a reinforcement learning signal associated with the respective output; and updating the machine-learned student sequence processing model based on the multiscale refinement objective.

3.

发明申请
STATE-DEPENDENT ACTION SPACE QUANTIZATION 有权

公开(公告)号：US20230093451A1

公开(公告)日：2023-03-23

申请号：US17947985

申请日：2022-09-19

Applicant: Google LLC

Inventor： Robert Dadashi-Tazehozi , Olivier Claude Pietquin , Léonard Hussenot Desenonges , Matthieu Florent Geist , Anton Raichuk , Damien Vincent , Sertan Girgin

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using a discretization neural network that generates a state-dependent discretization of an original action space and a policy neural network that is used to select an action from the state-dependent quantization rather than from the original action space.

4.

发明申请
TRAINING REINFORCEMENT LEARNING AGENTS USING AUGMENTED TEMPORAL DIFFERENCE LEARNING 有权

公开(公告)号：US20210390409A1

公开(公告)日：2021-12-16

申请号：US17347264

申请日：2021-06-14

Applicant: Google LLC

Inventor： Matthieu Florent Geist , Nino Vieillard , Olivier Claude Pietquin

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes training the neural network on one or more transitions selected from a replay memory, including: generating, using the neural network, an action selection output for the current observation; determining, based on the action selection output and the current action performed by the agent in response to the current observation, a state-action target for the current observation; determining a gradient of a temporal difference (TD) loss function with respect to parameters of the neural network, wherein the TD loss function comprises a first term that depends on the state-action target for the current observation; and adjusting current parameter values of the neural network based on the gradient.

Patent Agency Ranking