Abstract:
Methods and systems of frame based image segmentation are provided. For example, a method for feature object tracking between frames of video data is provided. The method comprises receiving a first frame of video data, extracting a mask feature for each of one or more objects of the first frame, adjusting the first frame by applying each initial mask and corresponding identification to a respective object of the first frame, and outputting the adjusted first frame. The method further comprises tracking the one or more objects in one or more consecutive frames. The tracking comprises extracting a masked feature for each of one or more objects in the consecutive frame, adjusting the consecutive frame by applying each initial mask and corresponding identification for the consecutive frame to the respective object of the one or more objects of the consecutive frame, and outputting the adjusted consecutive frame.
Abstract:
Methods, systems, and devices for object recognition are described. Generally, the described techniques provide for a compact and efficient convolutional neural network (CNN) model for facial recognition. The proposed techniques relate to a light model with a set of layers of convolution and one fully connected layer for feature representation. A new building block of for each convolution layer is proposed. A maximum feature map (MFM) operation may be employed to reduce channels (e.g., by combining two or more channels via maximum feature selection within the channels). Depth-wise separable convolution may be employed for computation reduction (e.g., reduction of convolution computation). Batch normalization may be applied to normalize the output of the convolution layers and the fully connected layer (e.g., to prevent overfitting). The described techniques provide a compact and efficient CNN model which can be used for efficient and effective face recognition.
Abstract:
A method performed by an electronic device is described. The method includes incrementally adding a current node to a graph. The method also includes incrementally determining a respective adaptive edge threshold for each candidate edge between the current node and one or more candidate neighbor nodes. The method further includes determining whether to accept or reject each candidate edge based on each respective adaptive edge threshold. The method additionally includes performing refining based on the graph to produce refined data. The method also includes producing a three-dimensional (3D) model based on the refined data.
Abstract:
A method is described. The method includes determining normalized radiance of an image sequence based on a camera response function (CRF). The method also includes determining one or more reliability images of the image sequence based on a reliability function corresponding to the CRF. The method further includes extracting features based on the normalized radiance of the image sequence. The method additionally includes optimizing a model based on the extracted features and the reliability images.
Abstract:
In various implementations, object tracking in a video content analysis system can be augmented with an image-based object re-identification system (e.g., for person re-identification or re-identification of other objects) to improve object tracking results for objects moving in a scene. The object re-identification system can use image recognition principles, which can be enhanced by considering data provided by object trackers that can be output by an object traffic system. In a testing stage, the object re-identification system can selectively test object trackers against object models. For most input video frames, not all object trackers need be tested against all object models. Additionally, different types of object trackers can be tested differently, so that a context provided by each object tracker can be considered. In a training stage, object models can also be selectively updated.
Abstract:
A method for determining a region of an image is described. The method includes presenting an image of a scene including one or more objects. The method also includes receiving an input selecting a single point on the image corresponding to a target object. The method further includes obtaining a motion mask based on the image. The motion mask indicates a local motion section and a global motion section of the image. The method further includes determining a region in the image based on the selected point and the motion mask.
Abstract:
An electronic device is described. The electronic device includes a processor. The processor is configured to obtain a plurality of images. The processor is also configured to obtain global motion information indicating global motion between at least two of the plurality of images. The processor is further configured to obtain object tracking information indicating motion of a tracked object between the at least two of the plurality of images. The processor is additionally configured to perform automatic zoom based on the global motion information and the object tracking information. Performing automatic zoom produces a zoom region including the tracked object. The processor is configured to determine a motion response speed for the zoom region based on a location of the tracked object within the zoom region.
Abstract:
Apparatus and methods for facial detection are disclosed. A plurality of images of an observed face is received for identification. Based at least on two or more selected images of the plurality of images, a template of the observed face is generated. In some embodiments, the template is a subspace generated based on feature vectors of the plurality of received images. A database of identities and corresponding facial data of known persons is searched based at least on the template of the observed face and the facial data of the known persons. One or more identities of the known persons are selected based at least on the search.
Abstract:
A method for three-dimensional face generation is described. An inverse depth map is calculated based on a depth map and an inverted first matrix. The inverted first matrix is generated from two images in which pixels are aligned vertically and differ horizontally. The inverse depth map is normalized to correct for distortions in the depth map caused by image rectification. A three-dimensional face model is generated based on the inverse depth map and one of the two images.
Abstract:
Embodiments include methods and systems for context-adaptive pixel processing based, in part, on a respective weighting-value for each pixel or a group of pixels. The weighting-values provide an indication as to which pixels are more pertinent to pixel processing computations. Computational resources and effort can be focused on pixels with higher weights, which are generally more pertinent for certain pixel processing determinations.