-
1.
公开(公告)号:US20240161529A1
公开(公告)日:2024-05-16
申请号:US18055752
申请日:2022-11-15
Applicant: Adobe Inc.
Inventor: Vlad Morariu , Puneet Mathur , Rajiv Jain , Ashutosh Mehra , Jiuxiang Gu , Franck Dernoncourt , Anandhavelu N , Quan Tran , Verena Kaynig-Fittkau , Nedim Lipka , Ani Nenkova
IPC: G06V30/413 , G06V10/82
CPC classification number: G06V30/413 , G06V10/82
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.
-
公开(公告)号:US20230153943A1
公开(公告)日:2023-05-18
申请号:US17455134
申请日:2021-11-16
Applicant: ADOBE INC.
Inventor: Jason Kuen , Jiuxiang Gu , Zhe Lin
CPC classification number: G06T3/4046 , G06K9/6202 , G06N3/08 , G06N3/0454
Abstract: Systems and methods for image processing are described. The systems and methods include receiving a low-resolution image; generating a feature map based on the low-resolution image using an encoder of a student network, wherein the encoder of the student network is trained based on comparing a predicted feature map from the encoder of the student network and a fused feature map from a teacher network, and wherein the fused feature map represents a combination of first feature map from a high-resolution encoder of the teacher network and a second feature map from a low-resolution encoder of the teacher network; and decoding the feature map to obtain prediction information for the low-resolution image.
-
公开(公告)号:US20220382975A1
公开(公告)日:2022-12-01
申请号:US17333892
申请日:2021-05-28
Applicant: Adobe Inc.
Inventor: Jiuxiang Gu , Vlad Morariu , Varun Manjunatha , Tong Sun , Rajiv Jain , Peizhao Li , Jason Kuen , Handong Zhao
IPC: G06F40/279 , G06N3/04 , G06N3/08 , G06F16/93 , G06F40/30 , G06F40/205
Abstract: One example method involves operations for a processing device that include receiving, by a machine learning model trained to generate a search result, a search query for a text input. The machine learning model is trained by receiving pre-training data that includes multiple documents. Pre-training the machine learning model by generating, using an encoder, feature embeddings for each of the documents included in the pre-training data. The feature embeddings are generated by applying a masking function to visual and textual features in the documents. Training the machine learning model also includes generating, using the feature embeddings, output features for the documents by concatenating the feature embeddings and applying a non-linear mapping to the feature embeddings. Training the machine learning model further includes applying a linear classifier to the output features. Additionally, operations include generating, for display, a search result using the machine learning model based on the input.
-
4.
公开(公告)号:US11373390B2
公开(公告)日:2022-06-28
申请号:US16448473
申请日:2019-06-21
Applicant: Adobe Inc.
Inventor: Handong Zhao , Zhe Lin , Sheng Li , Mingyang Ling , Jiuxiang Gu
IPC: G06V10/26 , G06K9/62 , G06N3/04 , G06N3/08 , G06V10/426
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.
-
公开(公告)号:US20250013866A1
公开(公告)日:2025-01-09
申请号:US18347877
申请日:2023-07-06
Applicant: ADOBE INC.
Inventor: Handong Zhao , Yue Bai , Zhe Lin , Ajinkya Gorakhnath Kale , Jiuxiang Gu , Tong Yu , Sungchul Kim
Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.
-
公开(公告)号:US20230376687A1
公开(公告)日:2023-11-23
申请号:US17746779
申请日:2022-05-17
Applicant: ADOBE INC.
Inventor: Vlad Ion Morariu , Tong Sun , Nikolaos Barmpalios , Zilong Wang , Jiuxiang Gu , Ani Nenkova Nenkova , Christopher Tensmeyer
IPC: G06F40/279 , G06N5/02
CPC classification number: G06F40/279 , G06N5/022
Abstract: Embodiments are provided for facilitating multimodal extraction across multiple granularities. In one implementation, a set of features of a document for a plurality of granularities of the document is obtained. Via a machine learning model, the set of features of the document are modified to generate a set of modified features using a set of self-attention values to determine relationships within a first type of feature and a set of cross-attention values to determine relationships between the first type of feature and a second type of feature. Thereafter, the set of modified features are provided to a second machine learning model to perform a classification task.
-
7.
公开(公告)号:US20230230198A1
公开(公告)日:2023-07-20
申请号:US17576091
申请日:2022-01-14
Applicant: Adobe Inc.
Inventor: Ruiyi Zhang , Yufan Zhou , Christopher Tensmeyer , Jiuxiang Gu , Tong Yu , Tong Sun
CPC classification number: G06T3/0056 , G06T11/00 , G10L15/22 , G10L15/26 , G06N3/04 , G10L2015/223
Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.
-
公开(公告)号:US20230153531A1
公开(公告)日:2023-05-18
申请号:US17528972
申请日:2021-11-17
Applicant: ADOBE INC.
Inventor: Shijie Geng , Christopher Tensmeyer , Curtis Michael Wigington , Jiuxiang Gu
IPC: G06F40/284 , G06N3/04 , G06F16/2452
CPC classification number: G06F40/284 , G06F16/24526 , G06N3/04
Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.
-
公开(公告)号:US11610393B2
公开(公告)日:2023-03-21
申请号:US17062157
申请日:2020-10-02
Applicant: Adobe Inc.
Inventor: Jason Wen Yong Kuen , Zhe Lin , Jiuxiang Gu
IPC: G06V10/778 , G06K9/62 , G06N3/04 , G06T3/60 , G06T3/40 , G06V10/774
Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and efficiently learning parameters of a distilled neural network from parameters of a source neural network utilizing multiple augmentation strategies. For example, the disclosed systems can generate lightly augmented digital images and heavily augmented digital images. The disclosed systems can further learn parameters for a source neural network from the lightly augmented digital images. Moreover, the disclosed systems can learn parameters for a distilled neural network from the parameters learned for the source neural network. For example, the disclosed systems can compare classifications of heavily augmented digital images generated by the source neural network and the distilled neural network to transfer learned parameters from the source neural network to the distilled neural network via a knowledge distillation loss function.
-
公开(公告)号:US20240386621A1
公开(公告)日:2024-11-21
申请号:US18318921
申请日:2023-05-17
Applicant: Adobe Inc.
Inventor: Ruiyi Zhang , Yufan Zhou , Tong Yu , Tong Sun , Rajiv Jain , Jiuxiang Gu , Christopher Alan Tensmeyer
IPC: G06T11/00 , G06F40/40 , G06V10/74 , G06V10/774 , G06V10/82
Abstract: Techniques and systems for training and/or implementing a text-to-image generation model are provided. A pre-trained multimodal model is leveraged for avoiding slower and more labor-intensive methodologies for training a text-to-image generation model. Accordingly, images without associated text (i.e., bare images) are provided to the pre-trained multimodal model so that it can produce generated text-image pairs. The generated text-image pairs are provided to the text-to-image generation model for training and/or implementing the text-to-image generation model.
-
-
-
-
-
-
-
-
-