-
公开(公告)号:US20250166395A1
公开(公告)日:2025-05-22
申请号:US18585444
申请日:2024-02-23
Applicant: QUALCOMM Incorporated
Inventor: Shizhong Steve HAN , Hong CAI , Haiyan WANG , Yinhao ZHU , Yunxiao SHI , Fatih Murat PORIKLI , Sourab BAPU SRIDHAR , Senthil Kumar YOGAMANI
Abstract: Certain aspects of the present disclosure provide techniques for performing 3D object detection. Such techniques may include obtaining a first set of features based on a first 2D view; obtaining a second set of features based on a second 2D view, obtaining a third set of features based on a third 2D view, obtaining a fourth set of features based on a fourth 2D view, wherein the first 2D view and the second 2D view are based on input from a first input sensor and the third 2D view and the fourth 2D view are based on input from a second input sensor. The techniques may also include performing cross-attention between the first set of features and the second set of features and between the third set of features and the fourth set of features; and performing 3D object detection.
-
2.
公开(公告)号:US20250148633A1
公开(公告)日:2025-05-08
申请号:US18666502
申请日:2024-05-16
Applicant: QUALCOMM Incorporated
Inventor: Rajeev YASARLA , Hong CAI , Risheek GARREPALLI , Yinhao ZHU , Jisoo JEONG , Yunxiao SHI , Manish Kumar SINGH , Fatih Murat PORIKLI
Abstract: Systems and techniques are provided for generating depth information. For example, a process can include obtaining a first feature volume including visual features corresponding to each respective frame included in a first set of frames. A first query generator network can generate reconstruction features associated with a reconstructed feature volume corresponding to the first feature volume. Based on the first feature volume, a second query generator network can generate motion features associated with predicted future motion corresponding to the first feature volume. An initial depth prediction can be generated for each respective frame based on cross-attention between features of a depth prediction decoder, the reconstruction features, and the motion features. A refined depth prediction can be generated for each respective based on cross-attention between the initial depth prediction, the reconstruction features, and the motion features.
-
公开(公告)号:US20240144589A1
公开(公告)日:2024-05-02
申请号:US18177028
申请日:2023-03-01
Applicant: QUALCOMM Incorporated
Inventor: Minghua LIU , Yinhao ZHU , Hong CAI , Fatih Murat PORIKLI , Hao SU
CPC classification number: G06T17/00 , G06T7/12 , G06V10/25 , G06V20/70 , G06T2207/10028 , G06V2201/07
Abstract: Systems and techniques are provided for part segmentation. For example, a process for performing part segmentation can include obtaining a three-dimensional capture of an object. The method can include generating one or more two-dimensional images of the object from the three-dimensional capture of the object. The method can further include processing the one or more two-dimensional images of the object to generate at least one two-dimensional bounding box associated with a part of the object. The method can include performing three-dimensional part segmentation of the part of the object based on a three-dimensional point cloud generated from the one or more two-dimensional images of the object and the at least one two-dimensional bounding box and based on semantically labeled super points which are merged into subgroups associated with the part of the object.
-
公开(公告)号:US20240386650A1
公开(公告)日:2024-11-21
申请号:US18509113
申请日:2023-11-14
Applicant: QUALCOMM Incorporated
Inventor: Farhad GHAZVINIAN ZANJANI , Leyla MIRVAKHABOVA , Yinhao ZHU , Hong CAI , Fatih Murat PORIKLI
Abstract: Systems and techniques are provided for processing image data corresponding to a scene. A process can include generating a planar distance map including a planar distance value for each pixel of at least one image corresponding to the scene. Planar segmentation is performed based on the planar distance map, a normal map corresponding to the at least one image, and positional encoding information of the planar distance map. A triangular mesh fragment is initialized based on sampling points from each planar segment of a plurality of planar segments from the planar segmentation. Ray-triangle intersections are determined based on performing ray casting for a reconstructed planar mesh including a plurality of triangular mesh fragments each corresponding to a different image. A planar reconstruction and segmentation machine learning network is optimized for the scene, based on training the planar reconstruction and segmentation machine learning network using one or more loss functions.
-
公开(公告)号:US20230100413A1
公开(公告)日:2023-03-30
申请号:US17486732
申请日:2021-09-27
Applicant: QUALCOMM Incorporated
Inventor: Yinhao ZHU , Yang YANG , Taco Sebastiaan COHEN
IPC: H04N19/60
Abstract: Systems and techniques are described herein for processing media data using a neural network system. For instance, a process can include obtaining a latent representation of a frame of encoded image data and generating, by a plurality of decoder transformer layers of a decoder sub-network using the latent representation of the frame of encoded image data as input, a frame of decoded image data. At least one decoder transformer layer of the plurality of decoder transformer layers includes: one or more transformer blocks for generating one or more patches of features and determine self-attention locally within one or more window partitions and shifted window partitions applied over the one or more patches; and a patch un-merging engine for decreasing a respective size of each patch of the one or more patches.
-
公开(公告)号:US20220224926A1
公开(公告)日:2022-07-14
申请号:US17573568
申请日:2022-01-11
Applicant: QUALCOMM Incorporated
Inventor: Yadong LU , Yang YANG , Yinhao ZHU , Amir SAID , Reza POURREZA , Taco Sebastiaan COHEN
IPC: H04N19/42 , H04N19/30 , H04N19/13 , H04N19/136 , H04N19/124
Abstract: A computer-implemented method for operating an artificial neural network (ANN) includes receiving an input by the ANN. The ANN generates a latent representation of the input. The latent representation is communicated according to a bit rate based on a learned latent scaling parameter. The latent scaling parameter is learned based on a channel index and a tradeoff parameter value that corresponds to a value that balances the bit rate and a distortion.
-
公开(公告)号:US20250166391A1
公开(公告)日:2025-05-22
申请号:US18585480
申请日:2024-02-23
Applicant: QUALCOMM Incorporated
Inventor: Shizhong Steve HAN , Hong CAI , Haiyan WANG , Yinhao ZHU , Yunxiao SHI , Fatih Murat PORIKLI , Sourab BAPU SRIDHAR , Senthil Kumar YOGAMANI
Abstract: Certain aspects of the present disclosure provide techniques for performing 3D object detection. Such techniques may include obtaining one or more inputs associated with one or more two-dimensional (2D) views of a scene; selecting a set of 2D views of the scene from a plurality of 2D views of the scene based on the one or more inputs, the set of 2D views comprising a first 2D view of the scene and a second 2D view of the scene; and performing three-dimensional (3D) object detection in the scene based on the set of 2D views.
-
公开(公告)号:US20250148628A1
公开(公告)日:2025-05-08
申请号:US18633302
申请日:2024-04-11
Applicant: QUALCOMM Incorporated
Inventor: Yunxiao SHI , Hong CAI , Manish Kumar SINGH , Shizhong Steve HAN , Yinhao ZHU , Fatih Murat PORIKLI
Abstract: Systems and techniques are provided for generating depth information from one or more images. For example, a process can include obtaining a first depth map corresponding to an input comprising an image of the one or more images and a sparse depth measurement. A three-dimensional (3D) point cloud can be generated based on the first depth map and multi-scale visual features of the input, wherein the 3D point cloud includes a plurality of 3D point features uplifted from the multi-scale visual features. At least a portion of the plurality of 3D point features can be processed using one or more self-attention layers to generate refined 3D point features. A two-dimensional (2D) projection of the refined 3D point features can be generated and a second depth map can be generated based on the 2D projection of the refined 3D point features.
-
公开(公告)号:US20240412493A1
公开(公告)日:2024-12-12
申请号:US18537404
申请日:2023-12-12
Applicant: QUALCOMM Incorporated
Inventor: Risheek GARREPALLI , Yunxiao SHI , Hong CAI , Yinhao ZHU , Shubhankar Mangesh BORSE , Jisoo JEONG , Debasmit DAS , Manish Kumar SINGH , Rajeev YASARLA , Shizhong Steve HAN , Fatih Murat PORIKLI
IPC: G06V10/776 , G06T7/50 , G06V10/764 , G06V10/82 , G06V20/70
Abstract: Systems and techniques are provided for processing image data. According to some aspects, a computing device can generate a gradient (e.g., a classifier gradient using a trained classifier) associated with a current sample. The computing device can combine the gradient with an iterative model estimated score function or data associated with the current sample to generate a score function estimate. The computing device can predict, using the diffusion machine learning model and based on the score function estimate, a new sample.
-
公开(公告)号:US20240303913A1
公开(公告)日:2024-09-12
申请号:US18180797
申请日:2023-03-08
Applicant: QUALCOMM Incorporated
Inventor: Yinhao ZHU , Rui ZHU , Hong CAI , Fatih Murat PORIKLI
CPC classification number: G06T15/506 , G06T7/593
Abstract: Systems and techniques are provided for physical-based light estimation for inverse rendering of indoor scenes. For example, a computing device can obtain an estimated scene geometry based on a multi-view observation of a scene. The computing device can further obtain a light emission mask based on the multi-view observation of the scene. The computing device can also obtain an emitted radiance field based on the multi-view observation of the scene. The computing device can then determine, based on the light emission mask and the emitted radiance field, a geometry of at least one light source of the estimated scene geometry.
-
-
-
-
-
-
-
-
-