-
公开(公告)号:US20240169692A1
公开(公告)日:2024-05-23
申请号:US17991410
申请日:2022-11-21
Inventor: Kanchana RANASINGHE , Muhammad Muzammal NASEER , Salman KHAN , Fahad KHAN
IPC: G06V10/74 , G06V20/59 , G06V40/20 , H04N19/132
CPC classification number: G06V10/761 , G06V20/597 , G06V40/23 , H04N19/132
Abstract: A system, computer readable medium and method trains a video transformer, using a machine learning engine, for human action recognition in a video. The method includes sampling video clips with varying temporal resolutions in global views and sampling the video clips from different spatiotemporal windows in local views. The machine learning engine is configured to match the global and local views in a framework of student-teacher networks to learn cross-view correspondence between local and global views, and to learn motion correspondence between varying temporal resolutions. The video transformer can output for display video clips in a manner that emphasizes attention to the recognized human action.
-
公开(公告)号:US20240203098A1
公开(公告)日:2024-06-20
申请号:US18084152
申请日:2022-12-19
Inventor: Maryam SULTANA , Muhammad Muzammal NASEER , Muhammad Haris KHAN , Salman KHAN , Fahad Shahbaz KHAN
IPC: G06V10/774 , G06V10/764 , G06V10/77 , G06V10/776 , G06V10/82
CPC classification number: G06V10/774 , G06V10/764 , G06V10/7715 , G06V10/776 , G06V10/82 , G06V2201/03
Abstract: An apparatus and method for a machine learning engine for domain generalization which trains a vision transformer neural network using a training dataset including at least two domains for diagnosis of a medical condition. Image patches and class tokens are processed through a sequence of feature extraction transformer blocks to obtain a predicted class token. In parallel, intermediate class tokens are extracted as outputs of each of the feature extraction transformer blocks, where each transformer block is a sub-model. One sub-model is randomly sampled from the sub-models to obtain a sampled intermediate class token. The intermediate class token is used to make a sub-model prediction. The vision transformer neural network is optimized based on a difference between the predicted class token and the sub-model prediction. Inferencing is performed for a target medical image in a target domain that is different from the at least two domains.
-
公开(公告)号:US20240212330A1
公开(公告)日:2024-06-27
申请号:US18089107
申请日:2022-12-27
Inventor: Mohammad Hanan GANI , Muhammad Muzammal NASEER , Mohammad YAQUB
IPC: G06V10/774 , G06V10/94 , G06V20/69 , G06V20/70
CPC classification number: G06V10/7753 , G06V10/95 , G06V20/695 , G06V20/698 , G06V20/70 , G06V2201/03
Abstract: A deep learning training system and method, includes an imaging system for capturing medical images, a machine learning engine, and display. The machine learning engine selects a small-scale of images from a training dataset, generates global views by randomly selecting regions in one image, generates local views by randomly selecting regions covering less than a majority of the image, receives the generated global views as a first sequence of non-overlapping image patches, receives the generated global views and the generated local views as a second sequence of non-overlapping image patches, trains parameters in a student-teacher network to predict a class of objects by self-supervised view prediction using the first sequence and the second sequence. The teacher parameters are updated via exponential moving average of the student network parameters. The parameters in the teacher network are transferred to the vision transformer, and the vision transformer is trained by supervised learning.
-
-