LEARNING POLICIES USING SPARSE AND UNDERSPECIFIED REWARDS

    公开(公告)号:US20210256313A1

    公开(公告)日:2021-08-19

    申请号:US17180682

    申请日:2021-02-19

    Applicant: Google LLC

    Abstract: Methods and systems for learning policies using sparse and underspecified rewards. One of the methods includes training the policy jointly with an auxiliary reward function having a plurality of auxiliary reward parameters, the auxiliary reward function being configured to map, in accordance with the auxiliary reward parameters, trajectory features of at least a trajectory to an auxiliary reward value that indicates how well the trajectory performed a task in response to a context input.

    CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNING

    公开(公告)号:US20230102544A1

    公开(公告)日:2023-03-30

    申请号:US17487769

    申请日:2021-09-28

    Applicant: Google LLC

    Abstract: Approaches are described for training an action selection neural network system for use in controlling an agent interacting with an environment to perform a task, using a contrastive loss function based on a policy similarity metric. In one aspect, a method includes: obtaining a first observation of a first training environment; obtaining a plurality of second observations of a second training environment; for each second observation, determining a respective policy similarity metric between the second observation and the first observation; processing the first observation and the second observations using the representation neural network to generate a first representation of the first training observation and a respective second representation of each second training observation; and training the representation neural network on a contrastive loss function computed using the policy similarity metrics and the first and second representations.

    Efficient Knowledge Distillation Framework for Training Machine-Learned Models

    公开(公告)号:US20250124256A1

    公开(公告)日:2025-04-17

    申请号:US18486792

    申请日:2023-10-13

    Applicant: Google LLC

    Abstract: An example method is provided for training a machine-learned student sequence processing model, the method comprising: obtaining a respective input; obtaining, from the student machine-learned sequence processing model, a respective output corresponding to the respective input; generating a multiscale refinement objective configured to jointly distill knowledge from a teacher machine-learned sequence processing model and reinforce preferred behavior of the student machine-learned sequence processing model, wherein the multiscale refinement objective comprises: a first component based on a divergence metric characterizing, for the respective input, a comparison of a plurality of predictions of the student machine-learned sequence processing model to a plurality of predictions of the teacher machine-learned sequence processing model; and a second component based on a reinforcement learning signal associated with the respective output; and updating the machine-learned student sequence processing model based on the multiscale refinement objective.

Patent Agency Ranking