Patent search ap:("Adobe Inc.") AND inv:"Haoran Cai" Page 1

1.

发明申请
VIDEO GENERATION USING FRAME-WISE TOKEN EMBEDDINGS 有权

公开(公告)号：US20250119624A1

公开(公告)日：2025-04-10

申请号：US18894443

申请日：2024-09-24

Applicant: ADOBE INC.

Inventor： Seoung Wug Oh , Mingi Kwon , Joon-Young Lee , Yang Zhou , Difan Liu , Haoran Cai , Baqiao Liu , Feng Liu

IPC: H04N21/81

Abstract: A method, apparatus, non-transitory computer readable medium, and system for generating synthetic videos includes obtaining an input prompt describing a video scene. The embodiments then generate a plurality of frame-wise token embeddings corresponding to a sequence of video frames, respectively, based on the input prompt. Subsequently, embodiments generate, using a video generation model, a synthesized video depicting the video scene. The synthesized includes a plurality of images corresponding to the sequence of video frames.

2.

发明授权
Video segment selection and editing using transcript interactions 有权

公开(公告)号：US12119028B2

公开(公告)日：2024-10-15

申请号：US17967364

申请日：2022-10-17

Applicant: Adobe Inc.

Inventor： Xue Bai , Justin Jonathan Salamon , Aseem Omprakash Agarwala , Hijung Shin , Haoran Cai , Joel Richard Brandt , Lubomira Assenova Dontcheva , Cristin Ailidh Fraser

IPC: G11B27/036 , G06F40/166 , G10L15/26 , G10L25/57 , G11B27/34 , G06F3/0482 , G06F3/04845 , G06F3/0485

CPC classification number: G11B27/036 , G06F40/166 , G10L15/26 , G10L25/57 , G11B27/34 , G06F3/0482 , G06F3/04845 , G06F3/0485

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for identifying candidate boundaries for video segments, video segment selection using those boundaries, and text-based video editing of video segments selected via transcript interactions. In an example implementation, boundaries of detected sentences and words are extracted from a transcript, the boundaries are retimed into an adjacent speech gap to a location where voice or audio activity is a minimum, and the resulting boundaries are stored as candidate boundaries for video segments. As such, a transcript interface presents the transcript, interprets input selecting transcript text as an instruction to select a video segment with corresponding boundaries selected from the candidate boundaries, and interprets commands that are traditionally thought of as text-based operations (e.g., cut, copy, paste) as an instruction to perform a corresponding video editing operation using the selected video segment.

3.

发明授权
Transcript paragraph segmentation and visualization of transcript paragraphs 有权

公开(公告)号：US12299401B2

公开(公告)日：2025-05-13

申请号：US17967562

申请日：2022-10-17

Applicant: Adobe Inc.

Inventor： Hanieh Deilamsalehy , Aseem Omprakash Agarwala , Haoran Cai , Hijung Shin , Joel Richard Brandt , Lubomira Assenova Dontcheva

IPC: G10L15/04 , G06F40/205 , G06F40/30 , H04N5/93 , G06V20/40 , G10L25/78 , G10L25/87

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for segmenting a transcript into paragraphs. In an example embodiment, a transcript is segmented to start a new paragraph whenever there is a change in speaker and/or a long pause in speech. If any remaining paragraphs are longer than a designated length or duration (e.g., 50 or 100 words), each of those paragraphs is segmented using dynamic programming to minimize a cost function that penalizes candidate paragraphs based on divergence from a target paragraph length and/or that rewards candidate paragraphs that group semantically similar sentences. As such, the transcript is visualized, segmented at the identified paragraphs.

4.

发明授权
Face-aware speaker diarization for transcripts and text-based video editing 有权

公开(公告)号：US12125501B2

公开(公告)日：2024-10-22

申请号：US17967399

申请日：2022-10-17

Applicant: Adobe Inc.

Inventor： Fabian David Caba Heilbron , Xue Bai , Aseem Omprakash Agarwala , Haoran Cai , Lubomira Assenova Dontcheva

IPC: G11B27/031 , G06V20/40

CPC classification number: G11B27/031 , G06V20/41

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification