Patent search ap:("Adobe Inc.") AND inv:"Aishwarya Agarwal" Page 1

1.

发明申请
VISUAL GROUNDING OF SELF-SUPERVISED REPRESENTATIONS FOR MACHINE LEARNING MODELS UTILIZING DIFFERENCE ATTENTION 有权

公开(公告)号：US20240420447A1

公开(公告)日：2024-12-19

申请号：US18336423

申请日：2023-06-16

Applicant: Adobe Inc.

Inventor： Aishwarya Agarwal , Srikrishna Karanam , Balaji Vasan Srinivasan

IPC: G06V10/75 , G06V10/80

Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing difference attention to evaluate and/or train machine learning models. In particular, in some embodiments, the disclosed systems generate, utilizing a machine learning model, a first feature vector from a digital image. In one or more implementations, the disclosed systems generate a masked digital image by masking a region from the digital image. Additionally, in some embodiments, the disclosed systems generate, utilizing the machine learning model, a second feature vector from the masked digital image. Moreover, in some implementations, the disclosed systems determine a difference feature vector between the first feature vector and the second feature vector. Furthermore, in some embodiments, the disclosed systems generate, from the difference feature vector, a difference attention map reflecting a visual grounding of the machine learning model relative to the region.

2.

发明申请
MODALITY ADAPTIVE INFORMATION RETRIEVAL 有权

公开(公告)号：US20220230061A1

公开(公告)日：2022-07-21

申请号：US17153130

申请日：2021-01-20

Applicant: Adobe Inc.

Inventor： Hrituraj Singh , Jatin Lamba , Denil Pareshbhai Mehta , Balaji Vasan Srinivasan , Anshul Nasery , Aishwarya Agarwal

IPC: G06N3/08 , G06F16/242 , G06N3/04 , G06K9/62 , G06K9/00 , G06F40/20

Abstract: In some embodiments, a multimodal computing system receives a query and identifies, from source documents, text passages and images that are relevant to the query. The multimodal computing system accesses a multimodal question-answering model that includes a textual stream of language models and a visual stream of language models. Each of the textual stream and the visual stream contains a set of transformer-based models and each transformer-based model includes a cross-attention layer using data generated by both the textual stream and visual stream of language models as an input. The multimodal computing system identifies text relevant to the query by applying the textual stream to the text passages and computes, using the visual stream, relevance scores of the images to the query, respectively. The multimodal computing system further generates a response to the query by including the text and/or an image according to the relevance scores.

3.

发明授权
Modality adaptive information retrieval 有权

公开(公告)号：US12198048B2

公开(公告)日：2025-01-14

申请号：US17153130

申请日：2021-01-20

Applicant: Adobe Inc.

Inventor： Hrituraj Singh , Jatin Lamba , Denil Pareshbhai Mehta , Balaji Vasan Srinivasan , Anshul Nasery , Aishwarya Agarwal

IPC: G06F16/24 , G06F16/242 , G06F18/214 , G06F18/22 , G06F40/20 , G06N3/045 , G06N3/08 , G06V30/40

Abstract: In some embodiments, a multimodal computing system receives a query and identifies, from source documents, text passages and images that are relevant to the query. The multimodal computing system accesses a multimodal question-answering model that includes a textual stream of language models and a visual stream of language models. Each of the textual stream and the visual stream contains a set of transformer-based models and each transformer-based model includes a cross-attention layer using data generated by both the textual stream and visual stream of language models as an input. The multimodal computing system identifies text relevant to the query by applying the textual stream to the text passages and computes, using the visual stream, relevance scores of the images to the query, respectively. The multimodal computing system further generates a response to the query by including the text and/or an image according to the relevance scores.

4.

发明申请
TEXT-TO-IMAGE SYNTHESIS UTILIZING DIFFUSION MODELS WITH TEST-TIME ATTENTION SEGREGATION AND RETENTION OPTIMIZATION 有权

公开(公告)号：US20240428468A1

公开(公告)日：2024-12-26

申请号：US18337634

申请日：2023-06-20

Applicant: Adobe Inc.

Inventor： Aishwarya Agarwal , Srikrishna Karanam , Joseph Koonthanam Jose , Apoorv Umang Saxena , Koustava Goswami , Balaji Vasan Srinivasan

IPC: G06T11/00 , G06N3/0455

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilizes attention segregation loss and/or attention retention loss at inference time of a diffusion neural network to generate a text-conditioned image. In particular, in some embodiments, the disclosed systems utilize the attention segregation loss to reduce overlap between concepts by comparing attention maps for multiple concepts of a text query corresponding to a denoising step. Further, in some embodiments, the disclosed systems utilize the attention retention loss to improve information retention for concepts across denoising steps by comparing attention maps between different denoising steps. Accordingly, in some embodiments, by utilizing the attention segregation loss and the attention retention loss, the disclosed systems accurately maintain multiple concepts from a text query when generating a text-conditioned image.

Patent Agency Ranking