TEXT-TO-IMAGE SYNTHESIS UTILIZING DIFFUSION MODELS WITH TEST-TIME ATTENTION SEGREGATION AND RETENTION OPTIMIZATION

    公开(公告)号:US20240428468A1

    公开(公告)日:2024-12-26

    申请号:US18337634

    申请日:2023-06-20

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilizes attention segregation loss and/or attention retention loss at inference time of a diffusion neural network to generate a text-conditioned image. In particular, in some embodiments, the disclosed systems utilize the attention segregation loss to reduce overlap between concepts by comparing attention maps for multiple concepts of a text query corresponding to a denoising step. Further, in some embodiments, the disclosed systems utilize the attention retention loss to improve information retention for concepts across denoising steps by comparing attention maps between different denoising steps. Accordingly, in some embodiments, by utilizing the attention segregation loss and the attention retention loss, the disclosed systems accurately maintain multiple concepts from a text query when generating a text-conditioned image.

Patent Agency Ranking