MULTIMODAL VIDEO SUMMARIZATION
    1.
    发明申请

    公开(公告)号:US20240404283A1

    公开(公告)日:2024-12-05

    申请号:US18328597

    申请日:2023-06-02

    Applicant: Adobe Inc.

    Abstract: A method includes receiving a video input and a text transcription of the video input. The video input includes a plurality of frames and the text transcription includes a plurality of sentences. The method further includes determining, by a multimodal summarization model, a subset of key frames of the plurality of frames and a subset of key sentences of the plurality of sentences. The method further includes providing a summary of the video input and a summary of the text transcription based on the subset of key frames and the subset of key sentences.

    CONTROLLED STYLE-CONTENT IMAGE GENERATION BASED ON DISENTANGLING CONTENT AND STYLE

    公开(公告)号:US20210264236A1

    公开(公告)日:2021-08-26

    申请号:US16802440

    申请日:2020-02-26

    Applicant: ADOBE INC.

    Abstract: Embodiments of the present disclosure are directed towards improved models trained using unsupervised domain adaptation. In particular, a style-content adaptation system provides improved translation during unsupervised domain adaptation by controlling the alignment of conditional distributions of a model during training such that content (e.g., a class) from a target domain is correctly mapped to content (e.g., the same class) in a source domain. The style-content adaptation system improves unsupervised domain adaptation using independent control over content (e.g., related to a class) as well as style (e.g., related to a domain) to control alignment when translating between the source and target domain. This independent control over content and style can also allow for images to be generated using the style-content adaptation system that contain desired content and/or style.

    NEURAL NETWORK FOR IMAGE STYLE TRANSLATION

    公开(公告)号:US20230070666A1

    公开(公告)日:2023-03-09

    申请号:US17466711

    申请日:2021-09-03

    Abstract: Embodiments are disclosed for translating an image from a source visual domain to a target visual domain. In particular, in one or more embodiments, the disclosed systems and methods comprise a training process that includes receiving a training input including a pair of keyframes and an unpaired image. The pair of keyframes represent a visual translation from a first version of an image in a source visual domain to a second version of the image in a target visual domain. The one or more embodiments further include sending the pair of keyframes and the unpaired image to an image translation network to generate a first training image and a second training image. The one or more embodiments further include training the image translation network to translate images from the source visual domain to the target visual domain based on a calculated loss using the first and second training images.

Patent Agency Ranking