SHARPNESS-AWARE MINIMIZATION FOR ROBUSTNESS IN SPARSE NEURAL NETWORKS

    公开(公告)号:US20240127067A1

    公开(公告)日:2024-04-18

    申请号:US18459083

    申请日:2023-08-31

    IPC分类号: G06N3/082

    CPC分类号: G06N3/082

    摘要: Systems and methods are disclosed for improving natural robustness of sparse neural networks. Pruning a dense neural network may improve inference speed and reduces the memory footprint and energy consumption of the resulting sparse neural network while maintaining a desired level of accuracy. In real-world scenarios in which sparse neural networks deployed in autonomous vehicles perform tasks such as object detection and classification for acquired inputs (images), the neural networks need to be robust to new environments, weather conditions, camera effects, etc. Applying sharpness-aware minimization (SAM) optimization during training of the sparse neural network improves performance for out of distribution (OOD) images compared with using conventional stochastic gradient descent (SGD) optimization. SAM optimizes a neural network to find a flat minimum: a region that both has a small loss value, but that also lies within a region of low loss.

    PRUNING A VISION TRANSFORMER
    4.
    发明申请

    公开(公告)号:US20230080247A1

    公开(公告)日:2023-03-16

    申请号:US17551005

    申请日:2021-12-14

    IPC分类号: G06V10/94 G06V10/70

    摘要: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.

    Articulated body mesh estimation using three-dimensional (3D) body keypoints

    公开(公告)号:US11361507B1

    公开(公告)日:2022-06-14

    申请号:US17315060

    申请日:2021-05-07

    IPC分类号: G06T17/20 G06T19/20

    摘要: Estimating a three-dimensional (3D) pose and shape of an articulated body mesh is useful for many different applications including health and fitness, entertainment, and computer graphics. A set of estimated 3D keypoint positions for a human body structure are processed to compute parameters defining the pose and shape of a parametric human body mesh using a set of geometric operations. During processing, 3D keypoints are extracted from the parametric human body mesh and a set of rotations are computed to align the extracted 3D keypoints with the estimated 3D keypoints. The set of rotations may correctly position a particular 3D keypoint location at a “joint”, but an arbitrary number of rotations of the “joint” keypoint may produce a twist in a connection to a child keypoint. Rules are applied to the set of rotations to resolve ambiguous twists and articulate the parametric human body mesh according to the computed parameters.

    3D HUMAN BODY POSE ESTIMATION USING A MODEL TRAINED FROM UNLABELED MULTI-VIEW DATA

    公开(公告)号:US20210248772A1

    公开(公告)日:2021-08-12

    申请号:US16897057

    申请日:2020-06-09

    摘要: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.

    NEURAL NETWORK BASED FACIAL ANALYSIS USING FACIAL LANDMARKS AND ASSOCIATED CONFIDENCE VALUES

    公开(公告)号:US20210182625A1

    公开(公告)日:2021-06-17

    申请号:US17004252

    申请日:2020-08-27

    IPC分类号: G06K9/62 G06K9/00

    摘要: Systems and methods for more accurate and robust determination of subject characteristics from an image of the subject. One or more machine learning models receive as input an image of a subject, and output both facial landmarks and associated confidence values. Confidence values represent the degrees to which portions of the subject's face corresponding to those landmarks are occluded, i.e., the amount of uncertainty in the position of each landmark location. These landmark points and their associated confidence values, and/or associated information, may then be input to another set of one or more machine learning models which may output any facial analysis quantity or quantities, such as the subject's gaze direction, head pose, drowsiness state, cognitive load, or distraction state.

    3D human body pose estimation using a model trained from unlabeled multi-view data

    公开(公告)号:US11417011B2

    公开(公告)日:2022-08-16

    申请号:US16897057

    申请日:2020-06-09

    摘要: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.