Abstract:
A video encoder generates a first network abstraction layer (NAL) unit. The first NAL unit contains a first fragment of a parameter set associated with video data. The video encoder also generates a second NAL unit. The second NAL unit contains a second fragment of the parameter set. A video decoder may receive a bitstream that includes the first and second NAL units. The video decoder decodes, based at least in part on the parameter set, one or more coded pictures of the video data.
Abstract:
Techniques described herein for coding video data include techniques for coding pictures partitioned into tiles, in which each of the plurality of tiles in a picture is assigned to one of a plurality of tile groups. One example method for coding video data comprising a picture that is partitioned into a plurality tiles comprises coding video data in a bitstream, and coding, in the bitstream, information that indicates one of a plurality of tile groups to which each of the plurality of tiles is assigned. The techniques for grouping tiles described herein may facilitate improved parallel processing for both encoding and decoding of video bitstreams, improved error resilience, and more flexible region of interest (ROI) coding.
Abstract:
Techniques for coding video data include coding a plurality of blocks of video data, wherein at least one block of the plurality of blocks of video data is coded using a coding mode that is one of an intra pulse code modulation (IPCM) coding mode and a lossless coding mode. In some examples, the lossless coding mode may use prediction. The techniques further include assigning a non-zero quantization parameter (QP) value for the at least one block coded using the coding mode. The techniques also include performing deblocking filtering on one or more of the plurality of blocks of video data based on the coding mode used to code the at least one block and the assigned non-zero QP value for the at least one block.
Abstract:
This disclosure proposes techniques to allow more flexibility in filtering chroma components in the adaptive loop filter. In one example, a method for adaptive loop filtering includes performing luma adaptive loop filtering based for luma components of a block of pixels, and performing chroma adaptive loop filtering for chroma components of the block of pixels, wherein filter coefficients for both the luma adaptive loop filtering and chroma adaptive loop filtering are derived from a block-based mode or a region-based mode. The method may further include determining to perform luma adaptive loop filtering on the block of pixels, and determining to perform chroma adaptive loop filtering on the block of pixels, wherein the determining to perform chroma adaptive loop filtering is performed independently of determining to perform luma adaptive loop filtering.
Abstract:
A method of processing video data includes receiving a picture; and filtering a current block of the picture, through a neural network and based on local correlations of proximate samples and distant, non-local correlations of non-proximate samples relative to the current block, to generate a filtered current block. The neural network comprises one or more backbone blocks and one or more transformer blocks. Each of the one or more transformer blocks is associated with a backbone block of the one or more backbone blocks. At least one of the backbone blocks is configured to capture the local correlations, relative to the current block and the proximate samples of the current block, and at least one of the transformer blocks is configured to generate features, based on applying an attention mechanism, that capture the distant, non-local correlations, relative to the current block and the non-proximate samples, in the picture for processing.
Abstract:
A method of encoding or decoding video data comprises: for each respective intra prediction mode of a plurality of intra prediction modes in a most-probable mode (MPM) list: generating, based on reference samples for a template region and using the respective intra prediction mode, prediction samples for the template region; and determining a cost for the respective intra prediction mode; determining a first intra prediction mode and a second intra prediction mode in the MPM list having lowest costs; determining a preliminary prediction block for the first intra prediction mode and a preliminary prediction block for the second intra prediction mode; generating a prediction block based on a fusion of the preliminary prediction blocks weighted according to a weight for the first intra prediction mode and a weight for the second intra prediction mode.
Abstract:
A device for decoding encoded video data is configured to determine that a chroma block of the encoded video data is coded in a cross-component prediction (CCP) mode; generate a merge candidate list for the chroma block, wherein the merge candidate list includes at least two prediction candidates generated by different CCP modes and a third prediction candidate, wherein the third prediction candidate comprises a fusion prediction candidate; receive, in the encoded video data, a syntax element set to a value; select a prediction candidate from the merge candidate list based on the value of the syntax element; determine a prediction block for the chroma block based on the selected prediction candidate; determine a decoded block of video data based on the prediction block for the chroma block; and output a decoded picture of video data that includes the decoded block of video data.
Abstract:
A video decoder may be configured to receive a block of video data that was encoded using a coding mode that includes a search process in one or more reference frames. The video decoder may prefetch reference samples in a fixed search region of at least one reference frame of the one or more reference frames, and decode the block of video data using the coding mode, including performing the search process for the coding mode using the prefetched reference samples.
Abstract:
A video decoder can be configured to determine that a current block of the video data is coded in a bi-prediction inter mode; receive a first syntax element identifying a motion vector predictor from a first candidate list of motion vector predictors; receive a second syntax element identifying a motion vector difference; determine a first motion vector for the current block based on the motion vector predictor and the motion vector difference; determine a second motion vector for the current block from a second list of candidate motion vector predictors based on bilateral matching; and determine a prediction block for the current block using the first motion vector and the second motion vector.
Abstract:
An example method of encoding a point cloud includes determining that residual values for all components except one component of an attribute of a point in the point cloud are equal to zero; based on the determination that the residual values for all components except the one component of the attribute are equal to zero, determining a value for the one component that is equal to a magnitude of a residual value of the one component of the attribute minus an offset; encoding the value of the one component; and signaling the encoded value in a bitstream.