Abstract:
Techniques and systems are provided for tracking objects in one or more video frames. For example, a first set of one or more bounding regions are determined for a video frame based on a trained classification network applied to the video frame. The first set of one or more bounding regions are associated with one or more objects in the video frame. One or more blobs can be detected for the video frame. A blob includes pixels of at least a portion of an object in the video frame. A second set of one or more bounding regions are determined for the video frame that are associated with the one or more blobs. A final set of one or more bounding regions is determined for the video frame using the first set of one or more bounding regions and the second set of one or more bounding regions. Object tracking can then be performed for the video frame using the final set of one or more bounding regions.
Abstract:
Techniques and systems are provided for classifying objects in one or more video frames. For example, a plurality of object trackers maintained for a current video frame can be obtained. A plurality of classification requests can also be obtained. The classification requests are associated with a subset of object trackers from the plurality of object trackers, and are generated based on one or more characteristics associated with the subset of object trackers. Based on the obtained plurality of classification requests, an object tracker is selected from the subset of object trackers for object classification. For example, the object tracker can be selected from the subset of object trackers based on priorities assigned to the subset of object trackers. The object classification can then be performed for the selected at least one object tracker.
Abstract:
Techniques and systems are provided for prioritizing objects for object recognition in one or more video frames. For example, a current video frame is obtained, and a objects are detected in the current video frame. State information associated with the objects is determined. Priorities for the objects can also be determined. For example, a priority can be determined for an object based on state information associated with the object. Object recognition is performed for at least one object from the objects based on priorities determined for the at least one object. For instance, object recognition can be performed for objects having higher priorities before objects having lower priorities.
Abstract:
An apparatus for coding video information according to certain aspects includes a processor configured to determine a value of a flag associated with a current picture of a current layer to be decoded, the flag indicating whether pictures in a decoded picture buffer (DPB) should be output, wherein the current picture is an intra random access point (TRAP) picture that starts a new coded video sequence (CVS) and wherein the determination of the value of the flag is based on at least one of: (1) the chroma format of the current picture and the chroma format of the preceding picture, (2) the bit depth of the luma samples of the current picture and the bit depth of the luma samples of the preceding picture, or (3) the bit depth of the chroma samples of the current picture and the bit depth of the chroma samples of the preceding picture.
Abstract:
An apparatus configured to code video information includes a memory unit and a processor in communication with the memory unit. The memory unit is configured to store video information associated with a first video layer having a first picture. The processor is configured to process picture order count (POC) derivation information associated with the first picture, and determine, based on the POC derivation information associated with the first picture, a POC value of at least one other picture in the first video layer that precedes the first picture in decoding order. The processor may encode or decode the video information.
Abstract:
A method of coding video data includes receiving one or more layers of video information. Each layer may include at least one picture. The method can include determining a number of active reference layer pictures associated with at least one picture of the one or more layers. The method can further include determining a number of direct reference layers associated with the at least one of the one or more layers. Based on the number of direct reference layers equaling the number of active reference layer pictures, the method can further include refraining from further signaling inter-layer reference picture information in any video slice associated with at least one of a video parameter set (VPS), a sequence parameter set (SPS), or a picture parameter set (PPS). Additionally or alternatively, based on the number of direct reference layers equaling the number of active reference layer pictures, the method can include adding to the inter-layer reference picture set all direct reference layer pictures for any video slice associated with at least one of a video parameter set (VPS), a sequence parameter set (SPS), or a picture parameter set (PPS).
Abstract:
A video parameter set (VPS) is associated with one or more coded video sequences (CVSs). The VPS includes a VPS extension for a video coding extension. The VPS extension includes a syntax element that indicates whether a video coding tool associated with the video coding extension is enabled for a set of applicable layers of a bitstream. When the syntax element indicates that the coding tool is enabled for the applicable layers, at least a portion of the video data that is associated with the CVSs and that is associated with the applicable layers is coded using the coding tool. When the syntax element indicates that the coding tool is not enabled for the applicable layers, the video data that is associated with the CVSs and that is associated with the applicable layers is not coded using the coding tool.
Abstract:
This disclosure described techniques for coding layer dependencies for a block of video data. According to these techniques, a video encoder generates layer dependencies associated with a given layer. The video encoder also generates a type of prediction associated with one or more of the layer dependencies. In some examples, the video encoder generates a first syntax element to signal layer dependencies and a second syntax element to signal a type of prediction associated with one or more of the layer dependencies. A video decoder may obtain the layer dependencies associated with a given layer and the type of prediction associated with one or more of the layer dependencies.
Abstract:
Aspects of this disclosure relate to, in an example, a method that includes identifying a first block of video data in a first temporal location from a first view, wherein the first block is associated with a first disparity motion vector. The method also includes determining a motion vector predictor for a second motion vector associated with a second block of video data, wherein the motion vector predictor is based on the first disparity motion vector. When the second motion vector comprises a disparity motion vector, the method includes determining the motion vector predictor comprises scaling the first disparity motion vector to generate a scaled motion vector predictor, wherein scaling the first disparity motion vector comprises applying a scaling factor comprising a view distance of the second disparity motion vector divided by a view distance of the first motion vector to the first disparity motion vector.
Abstract:
An apparatus for processing video data includes a processor configured to associate, in a minimum processing unit (MPU), one pixel of a depth image of a reference picture with one or more pixels of a first chroma component of a texture image of the reference picture, associate, in the MPU, the one pixel of the depth image with one or more pixels of a second chroma component of the texture image, and associate, in the MPU, the one pixel of the depth image with a plurality of pixels of a luma component of the texture image. The number of the pixels of the luma component is different than the number of the one or more pixels of the first chroma component and the number of the one or more pixels of the second chroma component.