Abstract:
Embodiments include methods and systems for context-adaptive pixel processing based, in part, on a respective weighting-value for each pixel or a group of pixels. The weighting-values provide an indication as to which pixels are more pertinent to pixel processing computations. Computational resources and effort can be focused on pixels with higher weights, which are generally more pertinent for certain pixel processing determinations.
Abstract:
A method for object classification by an electronic device is described. The method includes obtaining an image frame that includes an object. The method also includes determining samples from the image frame. Each of the samples represents a multidimensional feature vector. The method further includes adding the samples to a training set for the image frame. The method additionally includes pruning one or more samples from the training set to produce a pruned training set. One or more non-support vector negative samples are pruned first. One or more non-support vector positive samples are pruned second if necessary to avoid exceeding a sample number threshold. One or more support vector samples are pruned third if necessary to avoid exceeding the sample number threshold. The method also includes updating classifier model weights based on the pruned training set.
Abstract:
A method for memory utilization by an electronic device is described. The method includes transferring a first portion of a first decision tree and a second portion of a second decision tree from a first memory to a cache memory. The first portion and second portion of each decision tree are stored contiguously in the first memory. The first decision tree and second decision tree are each associated with a different feature of an object detection algorithm. The method also includes reducing cache misses by traversing the first portion of the first decision tree and the second portion of the second decision tree in the cache memory based on an order of execution of the object detection algorithm.
Abstract:
A method includes capturing an image of a scene that includes a diagram. The method further includes applying functional block recognition rules to image data of the image to recognize functional blocks of the diagram. The functional blocks include at least a first functional block associated with a first computer operation. The method further includes determining whether the functional blocks comply with functional block syntax rules. A functional graph is computer-generated based on the functional blocks complying with the functional block syntax rules. The functional graph corresponds to the diagram, and the functional graph includes the functional blocks.
Abstract:
Systems and techniques are provided for performing semantic image segmentation using a machine learning system (e.g., including one or more cross-attention transformer layers). For instance, a process can include generating one or more input image features for a frame of image data and generating one or more input depth features for a frame of depth data. One or more fused image features can be determined, at least in part, by fusing the one or more input depth features with the one or more input image features, using a first cross-attention transformer network. One or more segmentation masks can be generated for the frame of image data based on the one or more fused image features.
Abstract:
Methods and systems of frame based image segmentation are provided. For example, a method for feature object tracking between frames of video data is provided. The method comprises receiving a first frame of video data, extracting a mask feature for each of one or more objects of the first frame, adjusting the first frame by applying each initial mask and corresponding identification to a respective object of the first frame, and outputting the adjusted first frame. The method further comprises tracking the one or more objects in one or more consecutive frames. The tracking comprises extracting a masked feature for each of one or more objects in the consecutive frame, adjusting the consecutive frame by applying each initial mask and corresponding identification for the consecutive frame to the respective object of the one or more objects of the consecutive frame, and outputting the adjusted consecutive frame.
Abstract:
A method performed by an electronic device is described. The method includes receiving first optical data and first depth data corresponding to a first frame. The method also includes registering the first depth data to a first canonical model. The method further includes fitting a three-dimensional (3D) morphable model to the first optical data. The method additionally includes registering the 3D morphable model to a second canonical model. The method also includes producing a 3D object reconstruction based on the registered first depth data and the registered 3D morphable model.
Abstract:
A method performed by an electronic device is described. The method includes receiving first optical data and first depth data corresponding to a first frame. The method also includes registering the first depth data to a first canonical model. The method further includes fitting a three-dimensional (3D) morphable model to the first optical data. The method additionally includes registering the 3D morphable model to a second canonical model. The method also includes producing a 3D object reconstruction based on the registered first depth data and the registered 3D morphable model.
Abstract:
A method performed by an electronic device is described. The method includes obtaining a two-dimensional (2D) depth image. The method also includes extracting a 2D subset of the depth image. The 2D subset includes a center pixel and a set of neighboring pixels. The method further includes calculating a normal corresponding to the center pixel by calculating a covariance matrix based on the 2D subset.
Abstract:
Methods, systems, and devices for personalized (e.g., user specific) eye openness estimation are described. A network model (e.g., a convolutional neural network) may be trained using a set of synthetic eye openness image data (e.g., synthetic face images with known degrees or percentages of eye openness) and a set of real eye openness image data (e.g., facial images of real persons that are annotated as either open eyed or closed eyed). A device may estimate, using the network model, a multi-stage eye openness level (e.g., a percentage or degree to which an eye is open) of a user based on captured real time eye openness image data. The degree of eye openness estimated by the network model may then be compared to an eye size of the user (e.g., a user specific maximum eye size), and a user specific eye openness level may be estimated based on the comparison.