-
1.
公开(公告)号:US11565412B2
公开(公告)日:2023-01-31
申请号:US16622027
申请日:2018-09-15
Applicant: Google LLC
Inventor: Vikas Sindhwani
IPC: B25J9/16 , G06N20/10 , G06N3/04 , G06N3/08 , G05B19/423
Abstract: Techniques are described herein for generating a dynamical systems control policy. A non-parametric family of smooth maps is defined on which vector-field learning problems can be formulated and solved using convex optimization. In some implementations, techniques described herein address the problem of generating contracting vector fields for certifying stability of the dynamical systems arising in robotics applications, e.g., designing stable movement primitives. These learning problems may utilize a set of demonstration trajectories, one or more desired equilibria (e.g., a target point), and once or more statistics including at least an average velocity and average duration of the set of demonstration trajectories. The learned contracting vector fields may induce a contraction tube around a targeted trajectory for an end effector of the robot. In some implementations, the disclosed framework may use curl-free vector-valued Reproducing Kernel Hilbert Spaces.
-
公开(公告)号:US20200276704A1
公开(公告)日:2020-09-03
申请号:US16649598
申请日:2018-09-21
Applicant: GOOGLE LLC
Inventor: Vikas Sindhwani , Atil Iscen , Krzysztof Marcin Choromanski
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the determination of control policies for robots through the performance of simulations of robots and real-world context to determine control policy parameters.
-
公开(公告)号:US20240256865A1
公开(公告)日:2024-08-01
申请号:US18430586
申请日:2024-02-01
Applicant: Google LLC
Inventor: Deepali Jain , Krzysztof Marcin Choromanski , Sumeet Singh , Vikas Sindhwani , Tingnan Zhang , Jie Tan , Kumar Avinava Dubey
IPC: G06N3/08 , G06N3/0455
CPC classification number: G06N3/08 , G06N3/0455
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training neural networks. One of the methods for training a neural network configured to perform a machine learning task includes performing, at each of a plurality of iterations: performing a training step to obtain respective new gradients of a loss function; for each network parameter: generating an optimizer network input; processing the optimizer network input using an optimizer neural network, wherein the processing comprises, for each cell: generating a cell input for the cell; and processing the cell input for the cell to generate a cell output, wherein the processing comprises: obtaining latent embeddings from the cell input; generating the cell output from the hidden state; and determining an update to the hidden state; and generating an optimizer network output defining an update for the network parameter; and applying the update to the network parameter.
-
公开(公告)号:US11697205B2
公开(公告)日:2023-07-11
申请号:US16649598
申请日:2018-09-21
Applicant: GOOGLE LLC
Inventor: Vikas Sindhwani , Atil Iscen , Krzysztof Marcin Choromanski
CPC classification number: B25J9/163 , B25J9/1661 , B25J9/1671 , G06N3/08 , G06N20/00
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing the determination of control policies for robots through the performance of simulations of robots and real-world context to determine control policy parameters.
-
公开(公告)号:US20220108215A1
公开(公告)日:2022-04-07
申请号:US17423601
申请日:2019-12-16
Applicant: Google LLC
Inventor: Krzysztof Choromanski , Vikas Sindhwani , Aldo Pacchiano Camacho
Abstract: The present disclosure provides iterative blackbox optimization techniques that estimate the gradient of a function. According to an aspect of the present disclosure, a plurality of perturbations used at each iteration can be sampled from a non-orthogonal sampling distribution. As one example, in some implementations, perturbations that have been previously evaluated in previous iterations can be re-used at the current iteration. thereby conserving computing resources because the re-used perturbations do not need to be re-evaluated at the current iteration. In another example, in addition or alternatively to the use of previously evaluated perturbations, the perturbations evaluated at the current iteration can be sampled from a non-orthogonal sampling distribution.
-
公开(公告)号:US10515307B2
公开(公告)日:2019-12-24
申请号:US15172457
申请日:2016-06-03
Applicant: Google LLC
Inventor: Tara N. Sainath , Vikas Sindhwani
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.
-
公开(公告)号:US11741366B2
公开(公告)日:2023-08-29
申请号:US16726119
申请日:2019-12-23
Applicant: Google LLC
Inventor: Tara N. Sainath , Vikas Sindhwani
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.
-
公开(公告)号:US11420328B2
公开(公告)日:2022-08-23
申请号:US17425257
申请日:2020-01-31
Applicant: Google LLC
Inventor: Bachir El Khadir , Vikas Sindhwani , Jacob Varley
Abstract: Learning to effectively imitate human teleoperators, even in unseen, dynamic environments is a promising path to greater autonomy, enabling robots to steadily acquire complex skills from supervision. Various motion generation techniques are described herein that are rooted in contraction theory and sum-of-squares programming for learning a dynamical systems control policy in the form of a polynomial vector field from a given set of demonstrations. Notably, this vector field is provably optimal for the problem of minimizing imitation loss while providing certain continuous-time guarantees on the induced imitation behavior. Techniques herein generalize to new initial and goal poses of the robot and can adapt in real time to dynamic obstacles during execution, with convergence to teleoperator behavior within a well-defined safety tube.
-
公开(公告)号:US11179847B2
公开(公告)日:2021-11-23
申请号:US16341184
申请日:2017-10-12
Applicant: Google LLC
Inventor: Mrinal Kalakrishnan , Vikas Sindhwani
IPC: B25J9/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to plan actions to be performed by a robotic agent interacting with an environment to accomplish an objective by determining an optimized trajectory of state—action pairs for accomplishing the objective. The system maintains a current optimized trajectory and a current trust region radius, and optimizes a localized objective within the current trust region radius of the current optimized trajectory to determine a candidate updated optimized trajectory. The system determines whether the candidate updated optimized trajectory improves over the current optimized trajectory. In response to determining that the candidate updated optimized trajectory improves over the current optimized trajectory, the system updates the current optimized trajectory to the candidate updated optimized trajectory and updates the current trust region radius.
-
-
-
-
-
-
-
-