Abstract:
A method of generating metadata includes using at least one digital image to select a plurality of objects, wherein the at least one digital image depicts the plurality of objects in relation to a physical space. The method also includes, by at least one processor and based on information indicating positions of the selected objects in a location space, producing metadata that identifies one among a plurality of candidate geometrical arrangements of the selected objects.
Abstract:
Systems and techniques are provided for performing scene segmentation and object tracking. For example, a method for processing one or more frames. The method may include determining first one or more features from a first frame. The first frame includes a target object. The method may include obtaining a first mask associated with the first frame. The first mask includes an indication of the target object. The method may further include generating, based on the first mask and the first one or more features, a representation of a foreground and a background of the first frame. The method may include determining second one or more features from a second frame and determining, based on the representation of the foreground and the background of the first frame and the second one or more features, a location of the target object in the second frame.
Abstract:
A method for picture processing is described. A first tracking area is obtained. A second tracking area is also obtained. The method includes beginning to track the first tracking area and the second tracking area. Picture processing is performed once a portion of the first tracking area overlapping the second tracking area passes a threshold.
Abstract:
A method for picture processing is described. A first tracking area is obtained. A second tracking area is also obtained. The method includes beginning to track the first tracking area and the second tracking area. Picture processing is performed once a portion of the first tracking area overlapping the second tracking area passes a threshold.
Abstract:
A method of image retrieval includes obtaining information identifying a plurality of selected objects and selecting one among a plurality of candidate geometrical arrangements. This method also includes, by at least one processor, and in response to the selecting, identifying at least one digital image, among a plurality of digital images, that depicts the plurality of selected objects arranged according to the selected candidate geometrical arrangement.
Abstract:
A method for determining a region of an image is described. The method includes presenting an image of a scene including one or more objects. The method also includes receiving an input selecting a single point on the image corresponding to a target object. The method further includes obtaining a motion mask based on the image. The motion mask indicates a local motion section and a global motion section of the image. The method further includes determining a region in the image based on the selected point and the motion mask.
Abstract:
Systems, methods, and computer readable media are described for providing automatic zoom based adaptive video streaming. In some examples, a tracking video stream and a target video stream are obtained and are processed. The tracking video stream has a first resolution, and the target video stream has a second resolution that is higher than the first resolution. The tracking video stream is processed to define regions of interest for frames of the tracking video stream. The target video stream is processed to generate zoomed-in regions of frames of the target video stream. A zoomed-in region of the target video stream corresponds to a region of interest defined using the tracking video stream. The zoomed-in regions of the frames of the target video stream are then provided for display on a client device.
Abstract:
An electronic device is described. The electronic device includes a processor. The processor is configured to obtain a plurality of images. The processor is also configured to obtain global motion information indicating global motion between at least two of the plurality of images. The processor is further configured to obtain object tracking information indicating motion of a tracked object between the at least two of the plurality of images. The processor is additionally configured to perform automatic zoom based on the global motion information and the object tracking information. Performing automatic zoom produces a zoom region including the tracked object. The processor is configured to determine a motion response speed for the zoom region based on a location of the tracked object within the zoom region.
Abstract:
A method of generating a temporal saliency map is disclosed. In a particular embodiment, the method includes receiving an object bounding box from an object tracker. The method includes cropping a video frame based at least in part on the object bounding box to generate a cropped image. The method further includes performing spatial dual segmentation on the cropped image to generate an initial mask and performing temporal mask refinement on the initial mask to generate a refined mask. The method also includes generating a temporal saliency map based at least in part on the refined mask.
Abstract:
A method includes receiving a user input (e.g., a one-touch user input), performing segmentation to generate multiple candidate regions of interest (ROIs) in response to the user input, and performing ROI fusion to generate a final ROI (e.g., for a computer vision application). In some cases, the segmentation may include motion-based segmentation, color-based segmentation, or a combination thereof. Further, in some cases, the ROI fusion may include intraframe (or spatial) ROI fusion, temporal ROI fusion, or a combination thereof.