MONOCULAR IMAGE DEPTH ESTIMATION WITH ATTENTION

    公开(公告)号:US20240303841A1

    公开(公告)日:2024-09-12

    申请号:US18538869

    申请日:2023-12-13

    CPC classification number: G06T7/50 G06T7/246 G06T11/60 G06V10/44 G06V10/62

    Abstract: Disclosed are systems and techniques for capturing images (e.g., using a monocular image sensor) and detecting depth information. According to some aspects, a computing system or device can generate a feature representation of a current image and update accumulated feature information for storage in a memory based on a feature representation of a previous image and optical flow information of the previous image. The accumulated feature information can include accumulated image feature information associated with a plurality of previous images and accumulated optical flow information associated of the plurality of previous images. The computing system or device can obtain information associated with relative motion of the current image based on the accumulated feature information and the feature representation of the current image. The computing system or device can estimate depth information for the current image based on the information associated with the relative motion and the accumulated feature information.

    EFFICIENT COST VOLUME PROCESSING WITHIN ITERATIVE PROCESS

    公开(公告)号:US20240070812A1

    公开(公告)日:2024-02-29

    申请号:US18358857

    申请日:2023-07-25

    CPC classification number: G06T3/4053 G06T7/579

    Abstract: A processor-implemented method comprises processing a single level cost volume across multiple processing stages by varying a receptive field across each of the processing stages. The method also includes performing a learning-based correspondence estimation task based on the processing. The varying may include processing a different resolution of the cost volume at each processing stage while maintaining a same neighborhood sampling radius. The resolution may increase from a first processing stage to a later processing stage. The varying may also include varying a neighborhood sampling radius at each of the processing stages while maintaining a same resolution. The task may be optical flow estimation or stereo estimation.

    EFFICIENT DIFFUSION MACHINE LEARNING MODELS

    公开(公告)号:US20250124301A1

    公开(公告)日:2025-04-17

    申请号:US18488779

    申请日:2023-10-17

    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. During a first iteration of processing data using a denoising backbone of a diffusion machine learning model, a first latent tensor is generated using a lower resolution block of the denoising backbone, and a first feature tensor is generated based on processing the first latent tensor using a higher resolution block of the denoising backbone, the higher resolution block using a higher resolution than the lower resolution block. A second latent tensor is generated based on processing the first latent tensor using an adapter block of the denoising backbone. During a second iteration of processing the data using the denoising backbone, a second feature tensor is generated based on processing the second latent tensor using the higher resolution block.

    UNIFIED SIMULTANEOUS OPTICAL FLOW AND DEPTH ESTIMATION

    公开(公告)号:US20250095182A1

    公开(公告)日:2025-03-20

    申请号:US18468656

    申请日:2023-09-15

    Abstract: Techniques and systems are provided for image processing. For instance, a process can include correlating a first set of features from a first viewpoint with a second set of features from a second viewpoint at a first time period to generate a first disparity cost volume; correlating a third set of features from the first viewpoint at a second time period with the first set of features to generate a first optical flow cost volume; gating the first disparity cost volume to generate first intermediate disparity information; gating the first optical flow cost volume to generate first intermediate optical flow information; correlating the first set of features, the second set of features, and the first intermediate optical flow information to generate disparity information for output; and correlating the third set of features, the first set of features, and the first intermediate disparity information to generate optical flow information for output.

    HARDWARE-AWARE EFFICIENT ARCHITECTURES FOR TEXT-TO-IMAGE DIFFUSION MODELS

    公开(公告)号:US20250131606A1

    公开(公告)日:2025-04-24

    申请号:US18492572

    申请日:2023-10-23

    Abstract: A processor-implemented method includes receiving a text-semantic input at a first stage of a neural network, including a first convolutional block and no attention layers. The method receives, at a second stage, a first output from the first stage. The second stage comprises a first down sampling block including a first attention layer and a second convolutional block. The method receives, at a third stage, a second output from the second stage. The third stage comprises a first up sampling block including a second attention layer and a first set of convolutional blocks. The method receives, at a fourth stage, the first output from the first stage and a third output from the third stage. The fourth stage comprises a second up sampling block including no attention layers and a second set of convolutional blocks. The method generates an image at the fourth stage, based on the text-semantic input.

Patent Agency Ranking