-
1.
公开(公告)号:US20240161472A1
公开(公告)日:2024-05-16
申请号:US18499066
申请日:2023-10-31
Inventor: Yi Jiang , Jiannan Wu , Bin Yan , Zehuan Yuan
IPC: G06V10/774 , G06T7/73 , G06V10/22 , G06V10/764 , G06V10/77
CPC classification number: G06V10/7753 , G06T7/74 , G06V10/225 , G06V10/764 , G06V10/7715 , G06T2207/20081
Abstract: A method, device, and medium are provided for processing an image using a machine learning model that identifies at least one candidate object from an image. The model comprises: a feature extraction model for describing an association between the image and a feature of the at least one candidate object; and a classification scoring model for describing an association between the feature and a classification score of the at least one candidate object. An update parameter associated with the classification scoring model is determined based on the classification score of the at least one candidate object and a ground truth classification score of at least one ground truth object in the image. The classification scoring model is updated based on the update parameter associated with the classification scoring model. The feature extraction model is prevented from being updated with the update parameter associated with the classification scoring model.
-
公开(公告)号:US20240220864A1
公开(公告)日:2024-07-04
申请号:US18531091
申请日:2023-12-06
Inventor: Yi Jiang , Bin Yan , Jiannan Wu , Zehuan Yuan
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: A method, apparatus, device, and medium are provided for processing a visual task by a generic model. In a method, visual data and prompt data associated with a visual task are received, the visual task specifying that a processing result associated with the prompt data is to be determined from the visual data. A generic prompt representation of the prompt data is obtained, the prompt data including either an image format or a language expression format. A generic visual representation of the visual data is obtained, the visual data including either an image format or a video format. The processing result is determined based on the generic prompt representation and the generic visual representation. Here, different visual tasks can be processed in a unified way, training data can be shared across a plurality of visual tasks, and the processing performance of the generic processing model can be improved.
-