LEARNED B-FRAME CODING USING P-FRAME CODING SYSTEM

    公开(公告)号:US20240022761A1

    公开(公告)日:2024-01-18

    申请号:US18343618

    申请日:2023-06-28

    CPC classification number: H04N19/59 G06N3/063 G06N3/088 G06N3/045

    Abstract: Techniques are described for processing video data, such as by performing learned bidirectional coding using a unidirectional coding system and an interpolated reference frame. For example, a process can include obtaining a first reference frame and a second reference frame. The process can include generating a third reference frame at least in part by performing interpolation between the first reference frame and the second reference frame. The process can include performing unidirectional inter-prediction on an input frame based on the third reference frame, such as by estimating motion between an input frame and the third reference frame, and generating a warped frame at least in part by warping one or more pixels of the third reference frame based on the estimated motion. The process can include generating, based on the warped frame and a predicted residual, a reconstructed frame representing the input frame, the reconstructed frame including a bidirectionally-predicted frame.

    VIDEO COMPRESSION USING RECURRENT-BASED MACHINE LEARNING SYSTEMS

    公开(公告)号:US20210281867A1

    公开(公告)日:2021-09-09

    申请号:US17091570

    申请日:2020-11-06

    Abstract: Techniques are described herein for coding video content using recurrent-based machine learning tools. A device can include a neural network system including encoder and decoder portions. The encoder portion can generate output data for the current time step of operation of the neural network system based on an input video frame for a current time step of operation of the neural network system, reconstructed motion estimation data from a previous time step of operation, reconstructed residual data from the previous time step of operation, and recurrent state data from at least one recurrent layer of a decoder portion of the neural network system from the previous time step of operation. A decoder portion of the neural network system can generate, based on the output data and recurrent state data from the previous time step of operation, a reconstructed video frame for the current time step of operation.

    MULTI-SCALE OPTICAL FLOW FOR LEARNED VIDEO COMPRESSION

    公开(公告)号:US20220303568A1

    公开(公告)日:2022-09-22

    申请号:US17207244

    申请日:2021-03-19

    Abstract: Systems and techniques are described for encoding and/or decoding data based on motion estimation that applies variable-scale warping. An encoding device can receive an input frame and a reference frame that depict a scene at different times. The encoding device can generate an optical flow identifying movements in the scene between the two frames. The encoding device can generate a weight map identifying how finely or coarsely the reference frame can be warped for input frame prediction. The encoding device can generate encoded video data based on the optical flow and the weight map. A decoding device can generate a reconstructed optical flow and a reconstructed weight map from the encoded data. A decoding device can generate a prediction frame by warping the reference frame based on the reconstructed optical flow and the reconstructed weight map. The decoding device can generate a reconstructed input frame based on the prediction frame.

    USING GROUNDED RATIONALES TO IMPROVE VISUAL REASONING

    公开(公告)号:US20240386712A1

    公开(公告)日:2024-11-21

    申请号:US18500986

    申请日:2023-11-02

    Abstract: A processor-implemented method for generating grounded rationales for visual reasoning tasks includes receiving, by a first artificial neural network (ANN), an interleaved sequence of images and textual information. The first ANN extracts grid features of the images of the interleaved sequence of the images and the textual information to generate a representation of the interleaved sequence of the images and the textual information based on the grid features. A second ANN maps the grid features to a textual domain. The second ANN extracts visual information of the interleaved sequence of the images and the textual information based on the grid features in the textual domain. The second ANN determines a rationale based on the visual information. The visual information comprises one or more lower-level surrogate tasks.

Patent Agency Ranking