HARDWARE-AWARE EFFICIENT ARCHITECTURES FOR TEXT-TO-IMAGE DIFFUSION MODELS

    公开(公告)号:US20250131606A1

    公开(公告)日:2025-04-24

    申请号:US18492572

    申请日:2023-10-23

    Abstract: A processor-implemented method includes receiving a text-semantic input at a first stage of a neural network, including a first convolutional block and no attention layers. The method receives, at a second stage, a first output from the first stage. The second stage comprises a first down sampling block including a first attention layer and a second convolutional block. The method receives, at a third stage, a second output from the second stage. The third stage comprises a first up sampling block including a second attention layer and a first set of convolutional blocks. The method receives, at a fourth stage, the first output from the first stage and a third output from the third stage. The fourth stage comprises a second up sampling block including no attention layers and a second set of convolutional blocks. The method generates an image at the fourth stage, based on the text-semantic input.

Patent Agency Ranking