Abstract:
In general, techniques are described for separately coding depth and texture components of video data. A video coding device configured to code video data may perform the techniques. The video coding device may comprise a decoded picture buffer and a processor configured to store a depth component in the decoded picture buffer, analyze a view dependency to determine whether the depth component is used for inter-view prediction and remove the depth component from the decoded picture buffer in response to determining that the depth component is not used for inter-view prediction. for processing video data including a view component comprised of a depth component and a texture component
Abstract:
As one example, techniques for decoding video data include receiving a bitstream that includes one or more pictures of a coded video sequence (CVS), decoding a first picture according to a decoding order, wherein the first picture is a random access point (RAP) picture that is not an instantaneous decoding refresh (IDR) picture, and decoding at least one other picture following the first picture according to the decoding order based on the decoded first picture. As another example, techniques for encoding video data include generating a bitstream that includes one or more pictures of a CVS, wherein a first picture according to the decoding order is a RAP picture that is not an IDR picture, and avoiding including at least one other picture, other than the first picture, that corresponds to a leading picture associated with the first picture, in the bitstream.
Abstract:
A video encoder generates a first network abstraction layer (NAL) unit. The first NAL unit contains a first fragment of a parameter set associated with video data. The video encoder also generates a second NAL unit. The second NAL unit contains a second fragment of the parameter set. A video decoder may receive a bitstream that includes the first and second NAL units. The video decoder decodes, based at least in part on the parameter set, one or more coded pictures of the video data.
Abstract:
Techniques described herein for coding video data include techniques for coding pictures partitioned into tiles, in which each of the plurality of tiles in a picture is assigned to one of a plurality of tile groups. One example method for coding video data comprising a picture that is partitioned into a plurality tiles comprises coding video data in a bitstream, and coding, in the bitstream, information that indicates one of a plurality of tile groups to which each of the plurality of tiles is assigned. The techniques for grouping tiles described herein may facilitate improved parallel processing for both encoding and decoding of video bitstreams, improved error resilience, and more flexible region of interest (ROI) coding.
Abstract:
In one implementation, an apparatus is provided for encoding or decoding video information. The apparatus comprises a memory configured to store inter-layer reference pictures associated with a current picture that is being coded. The apparatus further comprises a processor operationally coupled to the memory. In one embodiment, the processor is configured to indicate a number of inter-layer reference pictures to use to predict the current picture using inter-layer prediction. The processor is also configured to indicate which of the inter-layer reference pictures to use to predict the current picture using inter-layer prediction. The processor is also configured to determine an inter-layer reference picture set associated with the current picture using the indication of the number of inter-layer reference pictures and the indication of which of the inter-layer reference pictures to use to predict the current picture using inter-layer prediction.
Abstract:
According to certain aspects, an apparatus for coding video information includes a memory and a processor configured to determine whether a first syntax element is present in a bitstream, the first syntax element associated with a sequence parameter set (SPS) and a first flag indicative of whether a temporal identifier (ID) of a reference picture for pictures that refer to the SPS can be nested; and in response to determining that the first syntax element is not present in the bitstream: obtain a second syntax element indicative of a maximum number of temporal sub-layers in a particular layer of the plurality of layers; and determine whether to set the first flag equal to a second flag indicative of whether a temporal ID of a reference picture for any pictures can be nested based at least in part on a value of the second syntax element.
Abstract:
Techniques for encapsulating video streams containing multiple coded views in a media file are described herein. In one example, a method includes parsing a track of multiview video data, wherein the track includes at least one depth view. The method further includes parsing information to determine a spatial resolution associated with the depth view, wherein decoding the spatial resolution does not require parsing of a sequence parameter set of the depth view. Another example method includes composing a track of multiview video data, wherein the track includes the one or more views. The example method further includes composing information to indicate a spatial resolution associated with the depth view, wherein decoding the spatial resolution does not require parsing of a sequence parameter set of the depth view.
Abstract:
A device includes, in a first track of the file, a first end of sequence (EOS) network abstraction layer (NAL) unit for a coded video sequence of a bitstream. The first EOS NAL unit is in a first access unit of the coded video sequence. The device also includes, in a second track of the file, a second EOS NAL unit for the coded video sequence. The second EOS NAL unit is in a second access unit of the coded video sequence, the second EOS NAL unit being different from the first EOS NAL unit. The device may perform similar actions for end of bitstream (EOB) NAL units.
Abstract:
Provided are systems, methods, and computer-readable medium for including parameters that describe fisheye images in a 360-degree video with the 360-degree video. The 360-degree video can then be stored and/or transmitted as captured by the omnidirectional camera, without transforming the fisheye images into some other format. The parameters can later be used to map the fisheye images to an intermediate format, such as an equirectangular format. The intermediate format can be used to store, transmit, and/or display the 360-degree video. The parameters can alternatively or additionally be used to map the fisheye images directly to a format that can be displayed in a 360-degree video presentation, such as a spherical format.
Abstract:
Techniques and systems are provided for processing video data. For example, 360-degree video data can be obtained for processing by an encoding device or a decoding device. The 360-degree video data includes pictures divided into motion-constrained tiles. The 360-degree video data can be used to generate a media file including several tracks. Each of the tracks contain a set of at least one of the motion-constrained tiles. The set of at least one of the motion-constrained tiles corresponds to at least one of several viewports of the 360-degree video data. A first tile representation can be generated for the media file. The first tile representation encapsulates a first track among the several tracks, and the first track includes a first set of at least one of the motion-constrained tiles at a first tile location in the pictures of the 360-degree video data. The first set of at least one of the motion-constrained tiles corresponds to a viewport of the 360-degree video data.