-
公开(公告)号:US20240355119A1
公开(公告)日:2024-10-24
申请号:US18305587
申请日:2023-04-24
Applicant: ADOBE INC.
Inventor: Ioana Croitoru , Trung Huu Bui , Zhaowen Wang , Seunghyun Yoon , Franck Dernoncourt , Hailin Jin
CPC classification number: G06V20/41 , G06V10/774 , G06V20/49 , G06V20/70 , G10L15/04 , G10L15/1815 , G10L15/22 , G10L25/57 , G10L15/16
Abstract: One or more aspects of the method, apparatus, and non-transitory computer readable medium include receiving a query relating to a long video. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include generating a segment of the long video corresponding to the query using a machine learning model trained to identify relevant segments from long videos. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include responding to the query based on the generated segment.
-
公开(公告)号:US12124508B2
公开(公告)日:2024-10-22
申请号:US17811963
申请日:2022-07-12
Applicant: ADOBE INC.
Inventor: Adyasha Maharana , Quan Hung Tran , Seunghyun Yoon , Franck Dernoncourt , Trung Huu Bui , Walter W. Chang
IPC: G06F16/73 , G06F16/738 , G06F16/783 , G06F40/284 , G10L13/08
CPC classification number: G06F16/739 , G06F16/7844 , G06F40/284 , G10L13/08
Abstract: Systems and methods for intent discovery and video summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video, encode the video to obtain a sequence of video encodings, encode the transcript to obtain a sequence of text encodings, apply a visual gate to the sequence of text encodings based on the sequence of video encodings to obtain gated text encodings, and generate an intent label for the transcript based on the gated text encodings.
-
3.
公开(公告)号:US20240320886A1
公开(公告)日:2024-09-26
申请号:US18125889
申请日:2023-03-24
Applicant: Adobe Inc.
Inventor: Cuong Nguyen , Trung Huu Bui , Jennifer Healey , Jane Elizabeth Hoffswell , Chen Chen
CPC classification number: G06T11/60 , G06F3/012 , G06F3/017 , G06N3/08 , G06T2200/24
Abstract: In some examples, an augmented reality (AR) server receives instructional data to be rendered in AR. The AR rendering server extracts multiple instruction steps from the instructional data and determines multiple spatial identifiers associated with the multiple instruction steps respectively. The multiple spatial identifiers correspond to multiple spatial objects in a real-world environment. The AR rendering server then generates AR rendering data for displaying the multiple instruction steps on an AR device at selected locations associated with the multiple spatial objects in the real-world environment. The AR rendering data is then transmitted to the AR device.
-
公开(公告)号:US20230136527A1
公开(公告)日:2023-05-04
申请号:US17453562
申请日:2021-11-04
Applicant: ADOBE INC.
Inventor: Jianguo Zhang , Trung Huu Bui , Seunghyun Yoon , Xiang Chen , Quan Hung Tran , Walter W. Chang
IPC: G06F40/40 , G06F40/30 , G06F40/284 , G06V30/19
Abstract: Systems and methods for natural language processing are described. One or more aspects of a method, apparatus, and non-transitory computer readable medium include receiving a text phrase; encoding the text phrase using an encoder to obtain a hidden representation of the text phrase, wherein the encoder is trained during a first training phrase using self-supervised learning based on a first contrastive loss and during a second training phrase using supervised learning based on a second contrastive learning loss; identifying an intent of the text phrase from a predetermined set of intent labels using a classification network, wherein the classification network is jointly trained with the encoder in the second training phase; and generating a response to the text phrase based on the intent.
-
5.
公开(公告)号:US20200320329A1
公开(公告)日:2020-10-08
申请号:US16904881
申请日:2020-06-18
Applicant: ADOBE INC.
Inventor: Trung Huu Bui , Hung Hai Bui , Shawn Alan Gaither , Walter Wei-Tuh Chang , Michael Frank Kraley , Pranjal Daga
Abstract: The present invention is directed towards providing automated workflows for the identification of a reading order from text segments extracted from a document. Ordering the text segments is based on trained natural language models. In some embodiments, the workflows are enabled to perform a method for identifying a sequence associated with a portable document. The methods includes iteratively generating a probabilistic language model, receiving the portable document, and selectively extracting features (such as but not limited to text segments) from the document. The method may generate pairs of features (or feature pair from the extracted features). The method may further generate a score for each of the pairs based on the probabilistic language model and determine an order to features based on the scores. The method may provide the extracted features in the determined order.
-
公开(公告)号:US12293577B2
公开(公告)日:2025-05-06
申请号:US17651771
申请日:2022-02-18
Applicant: Adobe Inc.
Inventor: Seunghyun Yoon , Trung Huu Bui , Franck Dernoncourt , Hyounghun Kim , Doo Soon Kim
Abstract: Embodiments of the disclosure provide a machine learning model for generating a predicted executable command for an image. The learning model includes an interface configured to obtain an utterance indicating a request associated with the image, an utterance sub-model, a visual sub-model, an attention network, and a selection gate. The machine learning model generates a segment of the predicted executable command from weighted probabilities of each candidate token in a predetermined vocabulary determined based on the visual features, the concept features, current command features, and the utterance features extracted from the utterance or the image.
-
公开(公告)号:US12277767B2
公开(公告)日:2025-04-15
申请号:US17804656
申请日:2022-05-31
Applicant: ADOBE INC.
Inventor: Hailin Jin , Jielin Qiu , Zhaowen Wang , Trung Huu Bui , Franck Dernoncourt
IPC: G06V20/00 , G06F16/34 , G06F16/683 , G06V10/774 , G06V20/40
Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.
-
公开(公告)号:US20230259718A1
公开(公告)日:2023-08-17
申请号:US17651555
申请日:2022-02-17
Applicant: Adobe Inc.
Inventor: Cesa Salaam , Seunghyun Yoon , Trung Huu Bui , Franck Dernoncourt
CPC classification number: G06F40/58 , G06F40/47 , G06N3/0454 , G06N3/08
Abstract: Techniques for training a language model for code switching content are disclosed. Such techniques include, in some embodiments, generating a dataset, which includes identifying one or more portions within textual content in a first language, the identified one or more portions each including one or more of offensive content or non-offensive content; translating the identified one or more salient portions to a second language; and reintegrating the translated one or more portions into the textual content to generate code-switched textual content. In some cases, the textual content in the first language includes offensive content and non-offensive content, the identified one or more portions include the offensive content, and the translated one or more portions include a translated version of the offensive content. In some embodiments, the code-switched textual content is at least part of a synthetic dataset usable to train a language model, such as a multilingual classification model.
-
公开(公告)号:US20230237093A1
公开(公告)日:2023-07-27
申请号:US17649091
申请日:2022-01-27
Applicant: ADOBE INC.
Inventor: Yifan Li , Trung Huu Bui , Timothy Jeewun Ganter , David Fox
IPC: G06F16/735 , G06N3/08
CPC classification number: G06F16/735 , G06N3/08
Abstract: Systems and methods for item recommendation are described. Embodiments of the present disclosure receive input indicating a relationship between a user and a first content item; generate a knowledge graph based on the input, wherein the knowledge graph comprises relationship information between the user and a plurality of content items; generate a first feature embedding representing the user and a second feature embedding representing a second content item of the plurality of content items based on the knowledge graph, wherein the second feature embedding is generated using a first modality for a query vector of an attention mechanism and a second modality for a key vector and a value vector of the attention mechanism; compare the first feature embedding to the second feature embedding to obtain a similarity score; and recommend the second content item for the user based on the similarity score.
-
公开(公告)号:US11538463B2
公开(公告)日:2022-12-27
申请号:US16383312
申请日:2019-04-12
Applicant: ADOBE INC.
Inventor: Trung Huu Bui , Subhadeep Dey , Franck Dernoncourt
Abstract: Methods and systems are provided for generating a customized speech recognition neural network system comprised of an adapted automatic speech recognition neural network and an adapted language model neural network. The automatic speech recognition neural network is first trained in a generic domain and then adapted to a target domain. The language model neural network is first trained in a generic domain and then adapted to a target domain. Such a customized speech recognition neural network system can be used to understand input vocal commands.
-
-
-
-
-
-
-
-
-