METHOD AND SYSTEM FOR PERSONALIZED MULTI-SUBJECT TEXT TO IMAGE GENERATION

    公开(公告)号:US20250166134A1

    公开(公告)日:2025-05-22

    申请号:US18903256

    申请日:2024-10-01

    Abstract: Text-to-image models are used to generate images based on text prompts. Existing text-to-image models create images that are often unclear and exhibit hybrid characteristics of multiple subjects i.e., each subject present in image exhibit characteristic of multiple subjects. Present disclosure provides a method and a system for personalized multi-subject text to image generation. The system first fine-tunes existing text-to-image diffusion model using a plurality of images of target subjects. Then, the system performs image generation based on local text prompts using the fine-tuned text-to-image diffusion model. In particular, the fine-tuned text-to-image diffusion model uses a composite diffusion algorithm for generating subject images. Thereafter, the system model computes a subject aware segmentation loss for generated images which is then used to correct the subject appearance in the generated images. Finally, the system applies a global diffuser to the generated images to create a harmonized image based on a global text prompt.

    VISION-BASED GENERATION OF NAVIGATION WORKFLOW FOR AUTOMATICALLY FILLING APPLICATION FORMS USING LARGE LANGUAGE MODELS

    公开(公告)号:US20250131185A1

    公开(公告)日:2025-04-24

    申请号:US18883765

    申请日:2024-09-12

    Abstract: Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. Present disclosure provides systems and methods that implement large language models (LLMs) coupled with deep learning based image understanding which adapt to new scenarios, including changes in user interface and variations in input data, without the need for human intervention. System of the present disclosure uses computer vision and natural language processing to perceive visible elements on graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate one or more navigation workflows that include a sequence of actions that are executed by a scripting engine/code to complete an assigned task from a task-request.

    METHOD AND SYSTEM FOR TABLE STRUCTURE RECOGNITION VIA DEEP SPATIAL ASSOCIATION OF WORDS

    公开(公告)号:US20230055391A1

    公开(公告)日:2023-02-23

    申请号:US17807215

    申请日:2022-06-16

    Abstract: State of art techniques that utilize spatial association based Table structure Recognition (TSR) have limitation in selecting minimal but most informative word pairs to generate digital table representation. Embodiments herein provide a method and system for TSR from an table image via deep spatial association of words using optimal number of word pairs, analyzed by a single classifier to determine word association. The optimal number of word pairs are identified by utilizing immediate left neighbors and immediate top neighbors approach followed redundant word pair elimination, thus enabling accurate capture of structural feature of even complex table images via minimal word pairs. The reduced number of word pairs in combination with the single classifier trained to determine the word associations into classes comprising as same cell, same row, same column and unrelated, provides TSR pipeline with reduced computational complexity, consuming less resources still generating more accurate digital representation of complex tables.

    INTELLIGENT VISUAL REASONING OVER GRAPHICAL ILLUSTRATIONS USING A MAC UNIT

    公开(公告)号:US20220222956A1

    公开(公告)日:2022-07-14

    申请号:US17594578

    申请日:2020-05-28

    Abstract: This disclosure relates generally to intelligent visual reasoning over graphical illustrations using a MAC unit. Prior arts use visual attention to map particular words in a question to specific areas in an image to memorize the corresponding answers, thereby resulting in a limited capability to answer questions of a specific type. The present disclosure incorporates the MAC unit to enable reasoning capabilities and accordingly attend to an area in the image to find the answer. The present disclosure therefore allows generalizing over a possible set of questions with varying complexities so that an unseen question can also be answered correctly based on the reasoning methods that it has learned. The system and method of the present disclosure can be used for understanding of visual information when processing documents like business reports, research papers, consensus reports etc. containing charts and reduce the time spent in manual analysis.

Patent Agency Ranking