-
公开(公告)号:US20230386052A1
公开(公告)日:2023-11-30
申请号:US17828962
申请日:2022-05-31
Applicant: QUALCOMM Incorporated
Inventor: Jiancheng LYU , Dashan GAO , Yingyong QI , Shuai ZHANG , Ning BI
CPC classification number: G06T7/248 , G06V10/62 , G06T7/194 , G06T7/11 , G06V10/764 , G06V10/806 , G06T7/74 , G06V20/70 , G06V2201/07 , G06T2207/20081 , G06T2207/20221 , G06T2207/10016
Abstract: Systems and techniques are provided for performing scene segmentation and object tracking. For example, a method for processing one or more frames. The method may include determining first one or more features from a first frame. The first frame includes a target object. The method may include obtaining a first mask associated with the first frame. The first mask includes an indication of the target object. The method may further include generating, based on the first mask and the first one or more features, a representation of a foreground and a background of the first frame. The method may include determining second one or more features from a second frame and determining, based on the representation of the foreground and the background of the first frame and the second one or more features, a location of the target object in the second frame.
-
公开(公告)号:US20230306600A1
公开(公告)日:2023-09-28
申请号:US17669040
申请日:2022-02-10
Applicant: QUALCOMM Incorporated
Inventor: Shuai ZHANG , Xiaowen YING , Jiancheng LYU , Yingyong QI
CPC classification number: G06T7/10 , G06N3/063 , G06T7/50 , G06T2207/10024 , G06T2207/20081
Abstract: Systems and techniques are provided for performing semantic image segmentation using a machine learning system (e.g., including one or more cross-attention transformer layers). For instance, a process can include generating one or more input image features for a frame of image data and generating one or more input depth features for a frame of depth data. One or more fused image features can be determined, at least in part, by fusing the one or more input depth features with the one or more input image features, using a first cross-attention transformer network. One or more segmentation masks can be generated for the frame of image data based on the one or more fused image features.
-
公开(公告)号:US20240378727A1
公开(公告)日:2024-11-14
申请号:US18316823
申请日:2023-05-12
Applicant: QUALCOMM Incorporated
Inventor: Xin LI , Jiancheng LYU , Yingyong QI
IPC: G06T7/11
Abstract: Techniques are provided for image processing. For instance, a process can include obtaining an image; extracting a first set of features at a first scale resolution; extracting a second set of features at a second scale resolution (lower than the first scale resolution); performing a self-attention transform to generate similarity scores for the second set of features; adding the similarity scores to the second set of features to generate a first feature extractor output; up-sampling the first feature extractor output to generate a second feature extractor output; adding the second feature extractor output to the first set of features to generate a third feature extractor output; receiving an instance query; performing a cross-attention transform on the instance query and the first feature extractor output to generate a set of weights; and matrix multiplying the set of weights and the third feature extractor output to generate instance masks.
-
-