Patent search ap:("Adobe Inc.") AND inv:"Jiuxiang Gu" Page 1

1.

发明公开
EXTRACTING DOCUMENT HIERARCHY USING A MULTIMODAL, LAYER-WISE LINK PREDICTION NEURAL NETWORK 审中-公开

公开(公告)号：US20240161529A1

公开(公告)日：2024-05-16

申请号：US18055752

申请日：2022-11-15

Applicant: Adobe Inc.

Inventor： Vlad Morariu , Puneet Mathur , Rajiv Jain , Ashutosh Mehra , Jiuxiang Gu , Franck Dernoncourt , Anandhavelu N , Quan Tran , Verena Kaynig-Fittkau , Nedim Lipka , Ani Nenkova

IPC: G06V30/413 , G06V10/82

CPC classification number: G06V30/413 , G06V10/82

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate a digital document hierarchy comprising layers of parent-child element relationships from the visual elements. For example, for a layer of the layers, the disclosed systems determine, from the visual elements, candidate parent visual elements and child visual elements. In addition, for the layer of the layers, the disclosed systems generate, from the feature embeddings utilizing a neural network, element classifications for the candidate parent visual elements and parent-child element link probabilities for the candidate parent visual elements and the child visual elements. Moreover, for the layer, the disclosed systems select parent visual elements from the candidate parent visual elements based on the parent-child element link probabilities. Further, the disclosed systems utilize the digital document hierarchy to generate an interactive digital document from the digital document image.

2.

发明公开
MULTI-SCALE DISTILLATION FOR LOW-RESOLUTION DETECTION 审中-公开

公开(公告)号：US20230153943A1

公开(公告)日：2023-05-18

申请号：US17455134

申请日：2021-11-16

Applicant: ADOBE INC.

Inventor： Jason Kuen , Jiuxiang Gu , Zhe Lin

IPC: G06T3/40 , G06K9/62 , G06N3/08 , G06N3/04

CPC classification number: G06T3/4046 , G06K9/6202 , G06N3/08 , G06N3/0454

Abstract: Systems and methods for image processing are described. The systems and methods include receiving a low-resolution image; generating a feature map based on the low-resolution image using an encoder of a student network, wherein the encoder of the student network is trained based on comparing a predicted feature map from the encoder of the student network and a fused feature map from a teacher network, and wherein the fused feature map represents a combination of first feature map from a high-resolution encoder of the teacher network and a second feature map from a low-resolution encoder of the teacher network; and decoding the feature map to obtain prediction information for the low-resolution image.

3.

发明申请
SELF-SUPERVISED DOCUMENT REPRESENTATION LEARNING 有权

公开(公告)号：US20220382975A1

公开(公告)日：2022-12-01

申请号：US17333892

申请日：2021-05-28

Applicant: Adobe Inc.

Inventor： Jiuxiang Gu , Vlad Morariu , Varun Manjunatha , Tong Sun , Rajiv Jain , Peizhao Li , Jason Kuen , Handong Zhao

IPC: G06F40/279 , G06N3/04 , G06N3/08 , G06F16/93 , G06F40/30 , G06F40/205

Abstract: One example method involves operations for a processing device that include receiving, by a machine learning model trained to generate a search result, a search query for a text input. The machine learning model is trained by receiving pre-training data that includes multiple documents. Pre-training the machine learning model by generating, using an encoder, feature embeddings for each of the documents included in the pre-training data. The feature embeddings are generated by applying a masking function to visual and textual features in the documents. Training the machine learning model also includes generating, using the feature embeddings, output features for the documents by concatenating the feature embeddings and applying a non-linear mapping to the feature embeddings. Training the machine learning model further includes applying a linear classifier to the output features. Additionally, operations include generating, for display, a search result using the machine learning model based on the input.

4.

发明授权
Generating scene graphs from digital images using external knowledge and image reconstruction 有权

公开(公告)号：US11373390B2

公开(公告)日：2022-06-28

申请号：US16448473

申请日：2019-06-21

Applicant: Adobe Inc.

Inventor： Handong Zhao , Zhe Lin , Sheng Li , Mingyang Ling , Jiuxiang Gu

IPC: G06V10/26 , G06K9/62 , G06N3/04 , G06N3/08 , G06V10/426

Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.

5.

发明申请
EFFICIENT VISION-LANGUAGE RETRIEVAL USING STRUCTURAL PRUNING 有权

公开(公告)号：US20250013866A1

公开(公告)日：2025-01-09

申请号：US18347877

申请日：2023-07-06

Applicant: ADOBE INC.

Inventor： Handong Zhao , Yue Bai , Zhe Lin , Ajinkya Gorakhnath Kale , Jiuxiang Gu , Tong Yu , Sungchul Kim

IPC: G06N3/082 , G06N3/04

Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.

6.

发明公开
MULTIMODAL EXTRACTION ACROSS MULTIPLE GRANULARITIES 审中-公开

公开(公告)号：US20230376687A1

公开(公告)日：2023-11-23

申请号：US17746779

申请日：2022-05-17

Applicant: ADOBE INC.

Inventor： Vlad Ion Morariu , Tong Sun , Nikolaos Barmpalios , Zilong Wang , Jiuxiang Gu , Ani Nenkova Nenkova , Christopher Tensmeyer

IPC: G06F40/279 , G06N5/02

CPC classification number: G06F40/279 , G06N5/022

Abstract: Embodiments are provided for facilitating multimodal extraction across multiple granularities. In one implementation, a set of features of a document for a plurality of granularities of the document is obtained. Via a machine learning model, the set of features of the document are modified to generate a set of modified features using a set of self-attention values to determine relationships within a first type of feature and a set of cross-attention values to determine relationships between the first type of feature and a second type of feature. Thereafter, the set of modified features are provided to a second machine learning model to perform a classification task.

7.

发明公开
UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK 审中-公开

公开(公告)号：US20230230198A1

公开(公告)日：2023-07-20

申请号：US17576091

申请日：2022-01-14

Applicant: Adobe Inc.

Inventor： Ruiyi Zhang , Yufan Zhou , Christopher Tensmeyer , Jiuxiang Gu , Tong Yu , Tong Sun

IPC: G06T3/00 , G06T11/00 , G10L15/22 , G10L15/26 , G06N3/04

CPC classification number: G06T3/0056 , G06T11/00 , G10L15/22 , G10L15/26 , G06N3/04 , G10L2015/223

Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.

8.

发明公开
ENHANCED DOCUMENT VISUAL QUESTION ANSWERING SYSTEM VIA HIERARCHICAL ATTENTION 审中-公开

公开(公告)号：US20230153531A1

公开(公告)日：2023-05-18

申请号：US17528972

申请日：2021-11-17

Applicant: ADOBE INC.

Inventor： Shijie Geng , Christopher Tensmeyer , Curtis Michael Wigington , Jiuxiang Gu

IPC: G06F40/284 , G06N3/04 , G06F16/2452

CPC classification number: G06F40/284 , G06F16/24526 , G06N3/04

Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.

9.

发明授权
Knowledge distillation for neural networks using multiple augmentation strategies 有权

公开(公告)号：US11610393B2

公开(公告)日：2023-03-21

申请号：US17062157

申请日：2020-10-02

Applicant: Adobe Inc.

Inventor： Jason Wen Yong Kuen , Zhe Lin , Jiuxiang Gu

IPC: G06V10/778 , G06K9/62 , G06N3/04 , G06T3/60 , G06T3/40 , G06V10/774

Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and efficiently learning parameters of a distilled neural network from parameters of a source neural network utilizing multiple augmentation strategies. For example, the disclosed systems can generate lightly augmented digital images and heavily augmented digital images. The disclosed systems can further learn parameters for a source neural network from the lightly augmented digital images. Moreover, the disclosed systems can learn parameters for a distilled neural network from the parameters learned for the source neural network. For example, the disclosed systems can compare classifications of heavily augmented digital images generated by the source neural network and the distilled neural network to transfer learned parameters from the source neural network to the distilled neural network via a knowledge distillation loss function.

10.

发明申请
TEXT-TO-IMAGE SYSTEM AND METHOD 有权

公开(公告)号：US20240386621A1

公开(公告)日：2024-11-21

申请号：US18318921

申请日：2023-05-17

Applicant: Adobe Inc.

Inventor： Ruiyi Zhang , Yufan Zhou , Tong Yu , Tong Sun , Rajiv Jain , Jiuxiang Gu , Christopher Alan Tensmeyer

IPC: G06T11/00 , G06F40/40 , G06V10/74 , G06V10/774 , G06V10/82

Abstract: Techniques and systems for training and/or implementing a text-to-image generation model are provided. A pre-trained multimodal model is leveraged for avoiding slower and more labor-intensive methodologies for training a text-to-image generation model. Accordingly, images without associated text (i.e., bare images) are provided to the pre-trained multimodal model so that it can produce generated text-image pairs. The generated text-image pairs are provided to the text-to-image generation model for training and/or implementing the text-to-image generation model.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification