Abstract:
In general, this disclosure describes techniques for improved inter-view residual prediction (IVRP) in three-dimensional video coding. These techniques include determining IVRP availability based on coded block flags and coding modes of residual reference blocks, disallowing IVRP coding when a block is inter-view predicted, using picture order count (POC) values to determine whether IVRP is permitted, applying IVRP to prediction units (PUs) rather than coding units (CUs), inferring values of IVRP flags when a block is skip or merge mode coded, using an IVRP flag of a neighboring block to determine context for coding an IVRP flag of a current block, and avoiding resetting of samples of a residual reference block to zeros during generation.
Abstract:
In one example, a device includes a video coder (e.g., a video encoder or a video decoder) configured to determine that a block of video data is to be coded in accordance with a three-dimensional extension of High Efficiency Video Coding (HEVC), and, based the determination that the block is to be coded in accordance with the three-dimensional extension of HEVC, disable temporal motion vector prediction for coding the block. The video coder may be further configured to, when the block comprises a bi-predicted block (B-block), determine that the B-block refers to a predetermined pair of pictures in a first reference picture list and a second reference picture list, and, based on the determination that the B-block refers to the predetermined pair, equally weight contributions from the pair of pictures when calculating a predictive block for the block.
Abstract:
Techniques and systems are provided for detecting false positive faces in one or more video frames. For example, a video frame of a scene can be obtained. The video frame includes a face of a user associated with at least one characteristic feature. The face of the user is determined to match a representative face from stored representative data. The representative face is associated with the at least one characteristic feature. The face of the user is determined to match the representative face based on the at least one characteristic feature. The face of the user can then be determined to be a false positive face based on the face of the user matching the representative face.
Abstract:
Methods, apparatuses, and computer-readable media are provided for splitting one or more merged blobs for one or more video frames. A blob detected for a current video frame is identified. The identified blob includes pixels of at least a portion of a foreground object in the current video frame. The identified blob is determined to be associated with two or more blob trackers from a plurality of blob trackers. The plurality of blob trackers are received from an object tracking operation performed for a previous video frame. It is then determined whether one or more splitting conditions are met. The splitting conditions can be based on a spatial relationship between bounding regions of the two or more blob trackers and a bounding region of the identified blob. The identified blob can be split into a first blob and a second blob in response to determining the one or more splitting conditions are met. If the identified blob is split, the first blob and the second blob are output for object tracking for the current frame by an object tracking system. In some cases, the identified blob is not output for object tracking for the current frame.
Abstract:
This disclosure describes techniques for simplifying delta DC residual coding in a 3D video coding process, such as 3D-HEVC. In some examples, the techniques may modify binarization and/or context modeling processes to reduce the complexity of entropy coding of one or more syntax elements used to represent delta DC residual values.
Abstract:
An apparatus configured to code video information includes a memory unit and a processor in communication with the memory unit. The memory unit is configured to store video information associated with a video layer having a picture. The processor is configured to determine whether the picture is a non-picture-order-count (POC)-anchor picture, and based on the determination of whether the picture is a non-POC-anchor picture, perform one of (1) refraining from indicating a POC reset in connection with the picture, or (2) indicating the POC reset in connection with the picture. The processor may encode or decode the video information.
Abstract:
An apparatus for decoding video information according to certain aspects includes a memory unit and a processor operationally coupled to the memory unit. The memory unit is configured to store at least one reference picture list of an enhancement layer, the at least one reference picture list comprising residual prediction reference picture information. The processor is configured to: decode signaled information about residual prediction reference picture generation; generate a residual prediction reference picture based on an enhancement layer reference picture and the decoded signaled information such that the generated residual prediction reference picture has the same motion field and the same picture order count (POC) as the enhancement layer reference picture from which it is generated; and store the generated residual prediction reference picture in the at least one reference picture list of the enhancement layer.
Abstract:
An apparatus for coding video information according to certain aspects includes a memory unit and a processor in communication with the memory unit. The memory unit is configured to store video information associated with a first layer having a first spatial resolution and a corresponding second layer having a second spatial resolution, wherein the first spatial resolution is less than the second spatial resolution. The video information includes at least motion field information associated with the first layer. The processor upsamples the motion field information associated with the first layer. The processor further adds an inter-layer reference picture including the upsampled motion field information in association with an upsampled texture picture of the first layer to a reference picture list to be used for inter prediction. The processor may encode or decode the video information.
Abstract:
A first reference index value indicates a position, within a reference picture list associated with a current prediction unit (PU) of a current picture, of a first reference picture. A reference index of a co-located PU of a co-located picture indicates a position, within a reference picture list associated with the co-located PU of the co-located picture, of a second reference picture. When the first reference picture and the second reference picture belong to different reference picture types, a video coder sets a reference index of a temporal merging candidate to a second reference index value. The second reference index value is different than the first reference index value.
Abstract:
An example method of decoding video data includes determining a header parameter set that includes one or more syntax elements specified individually by each of one or more slice headers, the header parameter set being associated with a header parameter set identifier (HPS ID), and determining one or more slice headers that reference the header parameter set to inherit at least one of the syntax elements included in the header parameter set, where the slice headers are each associated with a slice of the encoded video data, and where the slice headers each reference the header parameter set using the HPS ID.