Generating a robot control policy from demonstrations collected via kinesthetic teaching of a robot

    公开(公告)号:US11565412B2

    公开(公告)日:2023-01-31

    申请号:US16622027

    申请日:2018-09-15

    Applicant: Google LLC

    Inventor: Vikas Sindhwani

    Abstract: Techniques are described herein for generating a dynamical systems control policy. A non-parametric family of smooth maps is defined on which vector-field learning problems can be formulated and solved using convex optimization. In some implementations, techniques described herein address the problem of generating contracting vector fields for certifying stability of the dynamical systems arising in robotics applications, e.g., designing stable movement primitives. These learning problems may utilize a set of demonstration trajectories, one or more desired equilibria (e.g., a target point), and once or more statistics including at least an average velocity and average duration of the set of demonstration trajectories. The learned contracting vector fields may induce a contraction tube around a targeted trajectory for an end effector of the robot. In some implementations, the disclosed framework may use curl-free vector-valued Reproducing Kernel Hilbert Spaces.

    TRAINING NEURAL NETWORKS USING LEARNED OPTIMIZERS

    公开(公告)号:US20240256865A1

    公开(公告)日:2024-08-01

    申请号:US18430586

    申请日:2024-02-01

    Applicant: Google LLC

    CPC classification number: G06N3/08 G06N3/0455

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training neural networks. One of the methods for training a neural network configured to perform a machine learning task includes performing, at each of a plurality of iterations: performing a training step to obtain respective new gradients of a loss function; for each network parameter: generating an optimizer network input; processing the optimizer network input using an optimizer neural network, wherein the processing comprises, for each cell: generating a cell input for the cell; and processing the cell input for the cell to generate a cell output, wherein the processing comprises: obtaining latent embeddings from the cell input; generating the cell output from the hidden state; and determining an update to the hidden state; and generating an optimizer network output defining an update for the network parameter; and applying the update to the network parameter.

    Robust and Data-Efficient Blackbox Optimization

    公开(公告)号:US20220108215A1

    公开(公告)日:2022-04-07

    申请号:US17423601

    申请日:2019-12-16

    Applicant: Google LLC

    Abstract: The present disclosure provides iterative blackbox optimization techniques that estimate the gradient of a function. According to an aspect of the present disclosure, a plurality of perturbations used at each iteration can be sampled from a non-orthogonal sampling distribution. As one example, in some implementations, perturbations that have been previously evaluated in previous iterations can be re-used at the current iteration. thereby conserving computing resources because the re-used perturbations do not need to be re-evaluated at the current iteration. In another example, in addition or alternatively to the use of previously evaluated perturbations, the perturbations evaluated at the current iteration can be sampled from a non-orthogonal sampling distribution.

    Compressed recurrent neural network models

    公开(公告)号:US10515307B2

    公开(公告)日:2019-12-24

    申请号:US15172457

    申请日:2016-06-03

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

    Compressed recurrent neural network models

    公开(公告)号:US11741366B2

    公开(公告)日:2023-08-29

    申请号:US16726119

    申请日:2019-12-23

    Applicant: Google LLC

    CPC classification number: G06N3/084 G06N3/044

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

    Generating a robot control policy from demonstrations

    公开(公告)号:US11420328B2

    公开(公告)日:2022-08-23

    申请号:US17425257

    申请日:2020-01-31

    Applicant: Google LLC

    Abstract: Learning to effectively imitate human teleoperators, even in unseen, dynamic environments is a promising path to greater autonomy, enabling robots to steadily acquire complex skills from supervision. Various motion generation techniques are described herein that are rooted in contraction theory and sum-of-squares programming for learning a dynamical systems control policy in the form of a polynomial vector field from a given set of demonstrations. Notably, this vector field is provably optimal for the problem of minimizing imitation loss while providing certain continuous-time guarantees on the induced imitation behavior. Techniques herein generalize to new initial and goal poses of the robot and can adapt in real time to dynamic obstacles during execution, with convergence to teleoperator behavior within a well-defined safety tube.

    Selecting actions to be performed by a robotic agent

    公开(公告)号:US11179847B2

    公开(公告)日:2021-11-23

    申请号:US16341184

    申请日:2017-10-12

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to plan actions to be performed by a robotic agent interacting with an environment to accomplish an objective by determining an optimized trajectory of state—action pairs for accomplishing the objective. The system maintains a current optimized trajectory and a current trust region radius, and optimizes a localized objective within the current trust region radius of the current optimized trajectory to determine a candidate updated optimized trajectory. The system determines whether the candidate updated optimized trajectory improves over the current optimized trajectory. In response to determining that the candidate updated optimized trajectory improves over the current optimized trajectory, the system updates the current optimized trajectory to the candidate updated optimized trajectory and updates the current trust region radius.

Patent Agency Ranking