-
公开(公告)号:US12105755B1
公开(公告)日:2024-10-01
申请号:US17852063
申请日:2022-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Kamal Omar , Xiaohang Sun , Han-Kai Hsu , Ashutosh Sanan , Wentao Zhu
Abstract: Systems and techniques for retrieving video data associated with a selected attribute are described. The systems and techniques include receiving a multimodal input associated with the attribute for querying a catalog of video data to identify video data including the attribute. A first embedding is determined for the input using a first encoder to map the input to a representation space. A second embedding is determined for the video data to map the video data to the representation space. A similarity score is determined between the video data and the input based on a distance between the embeddings. The video data associated with the attribute may be selected based on the similarity score.