Abstract:
The subject disclosure relates to face recognition in video. Face detection data in frames of input data are used to generate face galleries, which are labeled and used in recognizing faces throughout the video. Metadata that associates the video frame and the face are generated and maintained for subsequent identification. Faces other than those found by face detection may be found by face tracking, in which facial landmarks found by the face detection are used to track a face over previous and/or subsequent video frames. Once generated, the maintained metadata may be accessed to efficiently determine the identity of a person corresponding to a viewer-selected face.
Abstract:
Various examples are disclosed herein that relate to staged element classification. For example, one disclosed example provides a method of classifying elements by forming elements for classification into a plurality of first-level sets in a first stage, generating primary groups within the first-level sets based on element similarity, forming a plurality of second-level sets from the first-level sets in a second stage, generating secondary groups within the second-level sets based on element similarity, and merging a plurality of the primary and/or secondary groups based on element similarity.
Abstract:
Various embodiments related to the generation and provision of media metadata are disclosed. For example, one disclosed embodiment provides a computing device having a logic subsystem configured to execute instructions, and a data holding subsystem comprising instructions stored thereon that are executable by the processor to receive an input of a video and/or audio content item, and to compare the content item to one or more object descriptors each representing an object for locating within the content item to locate instances of one or more of the objects in the content item. The instructions are further executable to generate metadata for each object located in the video content item, and to receive a validating user input related to whether the metadata generated for a selected object is correct.
Abstract:
A video encoder uses previously calculated motion information for inter frame coding to achieve faster computation speed for video compression. In a multi bit rate application, motion information produced by motion estimation for inter frame coding of a compressed video bit stream at one bit rate is passed on to a subsequent encoding of the video at a lower bit rate. The video encoder chooses to use the previously calculated motion information for inter frame coding at the lower bit rate if the video resolution is unchanged. A multi core motion information pre-calculation produces motion information prior to encoding by dividing motion estimation of each inter frame to separate CPU cores.
Abstract:
A semantic object tracking method tracks general semantic objects with multiple non-rigid motion, disconnected components and multiple colors throughout a vector image sequence. The method accurately tracks these general semantic objects by spatially segmenting image regions from a current frame and then classifying these regions as to which semantic object they originated from in the previous frame. To classify each region, the method performs a region based motion estimation between each spatially segmented region and the previous frame to compute the position of a predicted region in the previous frame. The method then classifies each region in the current frame as being part of a semantic object based on which semantic object in the previous frame contains the most overlapping points of the predicted region. Using this method, each region in the current image is tracked to one semantic object from the previous frame, with no gaps or overlaps. The method propagates few or no errors because it projects regions into a frame where the semantic object boundaries are previously computed rather than trying to project and adjust a boundary in a frame where the object's boundary is unknown.
Abstract:
A semantic video object extraction system using mathematical morphology and perspective motion modeling. A user indicates a rough outline around an image feature of interest for a first frame in a video sequence. Without further user assistance, the rough outline is processed by a morphological segmentation tool to snap the rough outline into a precise boundary surrounding the image feature. Motion modeling is performed on the image feature to track its movement into a subsequent video frame. The motion model is applied to the precise boundary to warp the precise outline into a new rough outline for the image feature in the subsequent video frame. This new rough outline is then snapped to locate a new precise boundary. Automatic processing is repeated for subsequent video frames.
Abstract:
Various embodiments related to the generation and provision of media metadata are disclosed. For example, one disclosed embodiment provides a computing device having a logic subsystem configured to execute instructions, and a data holding subsystem comprising instructions stored thereon that are executable by the processor to receive an input of a video and/or audio content item, and to compare the content item to one or more object descriptors each representing an object for locating within the content item to locate instances of one or more of the objects in the content item. The instructions are further executable to generate metadata for each object located in the video content item, and to receive a validating user input related to whether the metadata generated for a selected object is correct.
Abstract:
Image processing in mobile devices is optimized by combining at least two of the color conversion, rotation, and scaling operations. Received images, such as still images or frames of video stream, are subjected to a combined transformation after decoding, where each pixel is color converted (e.g. from YUV to RGB), rotated, and scaled as needed. By combining two or three of the processes into one, read/write operations consuming significant processing and memory resources are reduced enabling processing of higher resolution images and/or power and processing resource savings.
Abstract:
Various examples are disclosed herein that relate to staged element classification. For example, one disclosed example provides a method of classifying elements by forming elements for classification into a plurality of first-level sets in a first stage, generating primary groups within the first-level sets based on element similarity, forming a plurality of second-level sets from the first-level sets in a second stage, generating secondary groups within the second-level sets based on element similarity, and merging a plurality of the primary and/or secondary groups based on element similarity.
Abstract:
A video encoding system encodes video streams for multiple bit rate video streaming using an approach that permits the encoded resolution to vary based, at least in part, on motion complexity. The video encoding system dynamically decides an encoding resolution for segments of the multiple bit rate video streams that varies with video complexity so as to achieve a better visual experience for multiple bit rate streaming. Motion complexity may be considered separately, or along with spatial complexity, in making the resolution decision.