-
公开(公告)号:US20240256873A1
公开(公告)日:2024-08-01
申请号:US18424633
申请日:2024-01-26
Applicant: Google LLC
Inventor: Utku Evci , Pablo Samuel Castro Rivadeneira , Ghada AbdElRahman Zaki Nabawy Sokar , Rishabh Agarwal
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network and, during the training, resetting neurons that have been classified as being dormant.
-
公开(公告)号:US20210256313A1
公开(公告)日:2021-08-19
申请号:US17180682
申请日:2021-02-19
Applicant: Google LLC
Inventor: Rishabh Agarwal , Chen Liang , Dale Eric Schuurmans , Mohammad Norouzi
Abstract: Methods and systems for learning policies using sparse and underspecified rewards. One of the methods includes training the policy jointly with an auxiliary reward function having a plurality of auxiliary reward parameters, the auxiliary reward function being configured to map, in accordance with the auxiliary reward parameters, trajectory features of at least a trajectory to an auxiliary reward value that indicates how well the trajectory performed a task in response to a context input.
-
3.
公开(公告)号:US20230102544A1
公开(公告)日:2023-03-30
申请号:US17487769
申请日:2021-09-28
Applicant: Google LLC
Inventor: Rishabh Agarwal , Marlos Cholodovskis Machado , Pablo Samuel Castro Rivadeneira , Marc Gendron-Bellemare
Abstract: Approaches are described for training an action selection neural network system for use in controlling an agent interacting with an environment to perform a task, using a contrastive loss function based on a policy similarity metric. In one aspect, a method includes: obtaining a first observation of a first training environment; obtaining a plurality of second observations of a second training environment; for each second observation, determining a respective policy similarity metric between the second observation and the first observation; processing the first observation and the second observations using the representation neural network to generate a first representation of the first training observation and a respective second representation of each second training observation; and training the representation neural network on a contrastive loss function computed using the policy similarity metrics and the first and second representations.
-
公开(公告)号:US20250124256A1
公开(公告)日:2025-04-17
申请号:US18486792
申请日:2023-10-13
Applicant: Google LLC
IPC: G06N3/0455 , G06N3/092
Abstract: An example method is provided for training a machine-learned student sequence processing model, the method comprising: obtaining a respective input; obtaining, from the student machine-learned sequence processing model, a respective output corresponding to the respective input; generating a multiscale refinement objective configured to jointly distill knowledge from a teacher machine-learned sequence processing model and reinforce preferred behavior of the student machine-learned sequence processing model, wherein the multiscale refinement objective comprises: a first component based on a divergence metric characterizing, for the respective input, a comparison of a plurality of predictions of the student machine-learned sequence processing model to a plurality of predictions of the teacher machine-learned sequence processing model; and a second component based on a reinforcement learning signal associated with the respective output; and updating the machine-learned student sequence processing model based on the multiscale refinement objective.
-
-
-