-
公开(公告)号:US20240404283A1
公开(公告)日:2024-12-05
申请号:US18328597
申请日:2023-06-02
Applicant: Adobe Inc.
Inventor: Zhaowen WANG , Trung BUI , Bo HE
IPC: G06V20/40 , G06F40/166 , G06F40/40 , G06V10/774 , G06V10/776 , G06V10/80
Abstract: A method includes receiving a video input and a text transcription of the video input. The video input includes a plurality of frames and the text transcription includes a plurality of sentences. The method further includes determining, by a multimodal summarization model, a subset of key frames of the plurality of frames and a subset of key sentences of the plurality of sentences. The method further includes providing a summary of the video input and a summary of the text transcription based on the subset of key frames and the subset of key sentences.