Contrastive learning of scene representation guided by video similarities
摘要:
A plurality of similar video pairs may be determined based on one or more similarity information types. Each video pair of the plurality of similar video pairs may include a first respective video and a second respective video. For each video pair, one or more similar scene pairs may be determined. Each of the one or more similar scene pairs may include a respective first scene from the first respective video and a second respective scene from the second respective video. An encoder may be trained using a contrastive learning model that contrasts a plurality of similar scene pairs with a plurality of random scenes. The plurality of similar scene pairs may include the one or more scene pairs for each video pair. One or more scene features of one or more other scenes of one or more other videos may be determined using the encoder.
信息查询
0/0