VIDEO ACTION SEGMENTATION BY MIXED TEMPORAL DOMAIN ADAPTION
摘要:
Embodiments herein treat the action segmentation as a domain adaption (DA) problem and reduce the domain discrepancy by performing unsupervised DA with auxiliary unlabeled videos. In one or more embodiments, to reduce domain discrepancy for both the spatial and temporal directions, embodiments of a Mixed Temporal Domain Adaptation (MTDA) approach are presented to jointly align frame-level and video-level embedded feature spaces across domains, and, in one or more embodiments, further integrate with a domain attention mechanism to focus on aligning the frame-level features with higher domain discrepancy, leading to more effective domain adaptation. Comprehensive experiment results validate that embodiments outperform previous state-of-the-art methods. Embodiments can adapt models effectively by using auxiliary unlabeled videos, leading to further applications of large-scale problems, such as video surveillance and human activity analysis.
公开/授权文献
信息查询
0/0