-
1.
公开(公告)号:US20230102544A1
公开(公告)日:2023-03-30
申请号:US17487769
申请日:2021-09-28
Applicant: Google LLC
Inventor: Rishabh Agarwal , Marlos Cholodovskis Machado , Pablo Samuel Castro Rivadeneira , Marc Gendron-Bellemare
Abstract: Approaches are described for training an action selection neural network system for use in controlling an agent interacting with an environment to perform a task, using a contrastive loss function based on a policy similarity metric. In one aspect, a method includes: obtaining a first observation of a first training environment; obtaining a plurality of second observations of a second training environment; for each second observation, determining a respective policy similarity metric between the second observation and the first observation; processing the first observation and the second observations using the representation neural network to generate a first representation of the first training observation and a respective second representation of each second training observation; and training the representation neural network on a contrastive loss function computed using the policy similarity metrics and the first and second representations.