MULTI-SCALE DISTILLATION FOR LOW-RESOLUTION DETECTION

    公开(公告)号:US20230153943A1

    公开(公告)日:2023-05-18

    申请号:US17455134

    申请日:2021-11-16

    Applicant: ADOBE INC.

    CPC classification number: G06T3/4046 G06K9/6202 G06N3/08 G06N3/0454

    Abstract: Systems and methods for image processing are described. The systems and methods include receiving a low-resolution image; generating a feature map based on the low-resolution image using an encoder of a student network, wherein the encoder of the student network is trained based on comparing a predicted feature map from the encoder of the student network and a fused feature map from a teacher network, and wherein the fused feature map represents a combination of first feature map from a high-resolution encoder of the teacher network and a second feature map from a low-resolution encoder of the teacher network; and decoding the feature map to obtain prediction information for the low-resolution image.

    SELF-SUPERVISED DOCUMENT REPRESENTATION LEARNING

    公开(公告)号:US20220382975A1

    公开(公告)日:2022-12-01

    申请号:US17333892

    申请日:2021-05-28

    Applicant: Adobe Inc.

    Abstract: One example method involves operations for a processing device that include receiving, by a machine learning model trained to generate a search result, a search query for a text input. The machine learning model is trained by receiving pre-training data that includes multiple documents. Pre-training the machine learning model by generating, using an encoder, feature embeddings for each of the documents included in the pre-training data. The feature embeddings are generated by applying a masking function to visual and textual features in the documents. Training the machine learning model also includes generating, using the feature embeddings, output features for the documents by concatenating the feature embeddings and applying a non-linear mapping to the feature embeddings. Training the machine learning model further includes applying a linear classifier to the output features. Additionally, operations include generating, for display, a search result using the machine learning model based on the input.

    Generating scene graphs from digital images using external knowledge and image reconstruction

    公开(公告)号:US11373390B2

    公开(公告)日:2022-06-28

    申请号:US16448473

    申请日:2019-06-21

    Applicant: Adobe Inc.

    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.

    EFFICIENT VISION-LANGUAGE RETRIEVAL USING STRUCTURAL PRUNING

    公开(公告)号:US20250013866A1

    公开(公告)日:2025-01-09

    申请号:US18347877

    申请日:2023-07-06

    Applicant: ADOBE INC.

    Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.

    UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK

    公开(公告)号:US20230230198A1

    公开(公告)日:2023-07-20

    申请号:US17576091

    申请日:2022-01-14

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.

    ENHANCED DOCUMENT VISUAL QUESTION ANSWERING SYSTEM VIA HIERARCHICAL ATTENTION

    公开(公告)号:US20230153531A1

    公开(公告)日:2023-05-18

    申请号:US17528972

    申请日:2021-11-17

    Applicant: ADOBE INC.

    CPC classification number: G06F40/284 G06F16/24526 G06N3/04

    Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.

    Knowledge distillation for neural networks using multiple augmentation strategies

    公开(公告)号:US11610393B2

    公开(公告)日:2023-03-21

    申请号:US17062157

    申请日:2020-10-02

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and efficiently learning parameters of a distilled neural network from parameters of a source neural network utilizing multiple augmentation strategies. For example, the disclosed systems can generate lightly augmented digital images and heavily augmented digital images. The disclosed systems can further learn parameters for a source neural network from the lightly augmented digital images. Moreover, the disclosed systems can learn parameters for a distilled neural network from the parameters learned for the source neural network. For example, the disclosed systems can compare classifications of heavily augmented digital images generated by the source neural network and the distilled neural network to transfer learned parameters from the source neural network to the distilled neural network via a knowledge distillation loss function.

Patent Agency Ranking