THREE-DIMENSIONAL POINT CLOUDS BASED ON IMAGES AND DEPTH DATA

    公开(公告)号:US20230033177A1

    公开(公告)日:2023-02-02

    申请号:US17390174

    申请日:2021-07-30

    Applicant: Zoox, Inc.

    Inventor: Kratarth Goel

    Abstract: Techniques are discussed herein for generating three-dimensional (3D) representations of an environment based on two-dimensional (2D) image data, and using the 3D representations to perform 3D object detection and other 3D analyses of the environment. 2D image data may be received, along with depth estimation data associated with the 2D image data. Using the 2D image data and associated depth data, an image-based object detector may generate 3D representations, including point clouds and/or 3D pixel grids, for the 2D image or particular regions of interest. In some examples, a 3D point cloud may be generated by projecting pixels from the 2D image into 3D space followed by a trained 3D convolutional neural network (CNN) performing object detection. Additionally or alternatively, a top-down view of a 3D pixel grid representation may be used to perform object detection using 2D convolutions.

    Vehicle control system and method for pedestrian detection based on head detection in sensor data

    公开(公告)号:US11163990B2

    公开(公告)日:2021-11-02

    申请号:US16457524

    申请日:2019-06-28

    Applicant: Zoox, Inc.

    Inventor: Kratarth Goel

    Abstract: Techniques described herein relate to using head detection to improve pedestrian detection. In an example, a head can be detected in sensor data received from a sensor associated with a vehicle using a machine learned model. Based at least partly on detecting the head in the sensor data, a pedestrian can be determined to be present in an environment within which the vehicle is positioned. In an example, an indication of the pedestrian can be provided to at least one system of the vehicle, for instance, for use by the at least one system to make a determination associated with controlling the vehicle.

    DEPTH DATA MODEL TRAINING WITH UPSAMPLING, LOSSES, AND LOSS BALANCING

    公开(公告)号:US20210150279A1

    公开(公告)日:2021-05-20

    申请号:US16684568

    申请日:2019-11-14

    Applicant: Zoox, Inc.

    Abstract: Techniques for training a machine learned (ML) model to determine depth data based on image data are discussed herein. Training can use stereo image data and depth data (e.g., lidar data). A first (e.g., left) image can be input to a ML model, which can output predicted disparity and/or depth data. The predicted disparity data can be used with second image data (e.g., a right image) to reconstruct the first image. Differences between the first and reconstructed images can be used to determine a loss. Losses may include pixel, smoothing, structural similarity, and/or consistency losses. Further, differences between the depth data and the predicted depth data and/or differences between the predicted disparity data and the predicted depth data can be determined, and the ML model can be trained based on the various losses. Thus, the techniques can use self-supervised training and supervised training to train a ML model.

    Three-dimensional object detection based on image data

    公开(公告)号:US12056934B2

    公开(公告)日:2024-08-06

    申请号:US17390234

    申请日:2021-07-30

    Applicant: Zoox, Inc.

    Inventor: Kratarth Goel

    CPC classification number: G06V20/56 G06V20/64

    Abstract: Techniques are discussed herein for generating three-dimensional (3D) representations of an environment based on two-dimensional (2D) image data, and using the 3D representations to perform 3D object detection and other 3D analyses of the environment. 2D image data may be received, along with depth estimation data associated with the 2D image data. Using the 2D image data and associated depth data, an image-based object detector may generate 3D representations, including point clouds and/or 3D pixel grids, for the 2D image or particular regions of interest. In some examples, a 3D point cloud may be generated by projecting pixels from the 2D image into 3D space followed by a trained 3D convolutional neural network (CNN) performing object detection. Additionally or alternatively, a top-down view of a 3D pixel grid representation may be used to perform object detection using 2D convolutions.

Patent Agency Ranking