-
公开(公告)号:US20220156527A1
公开(公告)日:2022-05-19
申请号:US17209011
申请日:2021-03-22
Applicant: salesforce.com, inc.
Inventor: Ramprasaath Ramasamy Selvaraju , Nikhil Naik
Abstract: Embodiments described herein embodiments described herein provide Contrastive Attention-Supervised Tuning (CAST), a training method to fix the visual grounding ability of contrastive SSL methods based on a data augmentation strategy using unsupervised saliency maps. In addition to the contrastive loss that encourages the model to pick the crop that comes from the corresponding image, CAST provides an explicit grounding supervision through a Grad-CAM based attention loss that enforces models to look at the specified object of interest that is common across different crops when making this decision. A new geometric transform is introduced for randomly cropping different views from an input image based on certain constraints derived from a saliency map.
-
公开(公告)号:US20230154139A1
公开(公告)日:2023-05-18
申请号:US17589709
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Brian Chen , Ramprasaath Ramasamy Selvaraju , Juan Carlos Niebles Duque , Nikhil Naik
CPC classification number: G06V10/454 , G06V10/462 , G06V10/62
Abstract: Embodiments described herein provide an intelligent method to select instances, by utilizing unsupervised tracking for videos. Using this freely available form of supervision, a temporal constraint is adopted for selecting instances that ensures that different instances contain the same object while sampling the temporal augmentation from the video. In addition, using the information on the spatial extent of the tracked object, spatial constraints are applied to ensure that sampled instances overlap meaningfully with the tracked object. Taken together, these spatiotemporal constraints result in better supervisory signal for contrastive learning from videos.
-
公开(公告)号:US12106541B2
公开(公告)日:2024-10-01
申请号:US17589709
申请日:2022-01-31
Applicant: Salesforce.com, Inc.
Inventor: Brian Chen , Ramprasaath Ramasamy Selvaraju , Juan Carlos Niebles Duque , Nikhil Naik
CPC classification number: G06V10/454 , G06V10/462 , G06V10/62
Abstract: Embodiments described herein provide an intelligent method to select instances, by utilizing unsupervised tracking for videos. Using this freely available form of supervision, a temporal constraint is adopted for selecting instances that ensures that different instances contain the same object while sampling the temporal augmentation from the video. In addition, using the information on the spatial extent of the tracked object, spatial constraints are applied to ensure that sampled instances overlap meaningfully with the tracked object. Taken together, these spatiotemporal constraints result in better supervisory signal for contrastive learning from videos.
-
公开(公告)号:US20220156592A1
公开(公告)日:2022-05-19
申请号:US17209013
申请日:2021-03-22
Applicant: salesforce.com, inc.
Inventor: Ramprasaath Ramasamy Selvaraju , Nikhil Naik
Abstract: Embodiments described herein embodiments described herein provide Contrastive Attention-Supervised Tuning (CAST), a training method to fix the visual grounding ability of contrastive SSL methods based on a data augmentation strategy using unsupervised saliency maps. In addition to the contrastive loss that encourages the model to pick the crop that comes from the corresponding image, CAST provides an explicit grounding supervision through a Grad-CAM based attention loss that enforces models to look at the specified object of interest that is common across different crops when making this decision. A new geometric transform is introduced for randomly cropping different views from an input image based on certain constraints derived from a saliency map.
-
-
-