USING GROUNDED RATIONALES TO IMPROVE VISUAL REASONING

    公开(公告)号:US20240386712A1

    公开(公告)日:2024-11-21

    申请号:US18500986

    申请日:2023-11-02

    Abstract: A processor-implemented method for generating grounded rationales for visual reasoning tasks includes receiving, by a first artificial neural network (ANN), an interleaved sequence of images and textual information. The first ANN extracts grid features of the images of the interleaved sequence of the images and the textual information to generate a representation of the interleaved sequence of the images and the textual information based on the grid features. A second ANN maps the grid features to a textual domain. The second ANN extracts visual information of the interleaved sequence of the images and the textual information based on the grid features in the textual domain. The second ANN determines a rationale based on the visual information. The visual information comprises one or more lower-level surrogate tasks.

Patent Agency Ranking