-
公开(公告)号:US20250166134A1
公开(公告)日:2025-05-22
申请号:US18903256
申请日:2024-10-01
Applicant: Tata Consultancy Services Limited
Inventor: ARUSHI JAIN , SHUBHAM SINGH PALIWAL , MONIKA SHARMA , VIKRAM JAMWAL , LOVEKESH VIG
IPC: G06T5/60 , G06T5/50 , G06T5/70 , G06T7/11 , G06T7/194 , G06T11/60 , G06V10/764 , G06V10/776
Abstract: Text-to-image models are used to generate images based on text prompts. Existing text-to-image models create images that are often unclear and exhibit hybrid characteristics of multiple subjects i.e., each subject present in image exhibit characteristic of multiple subjects. Present disclosure provides a method and a system for personalized multi-subject text to image generation. The system first fine-tunes existing text-to-image diffusion model using a plurality of images of target subjects. Then, the system performs image generation based on local text prompts using the fine-tuned text-to-image diffusion model. In particular, the fine-tuned text-to-image diffusion model uses a composite diffusion algorithm for generating subject images. Thereafter, the system model computes a subject aware segmentation loss for generated images which is then used to correct the subject appearance in the generated images. Finally, the system applies a global diffuser to the generated images to create a harmonized image based on a global text prompt.