摘要:
A method of generating congruous metadata is provided. The method includes receiving a similarity measure between at least two multimedia objects. Each multimedia object has associated metadata. If the at least two multimedia objects are similar based on the similarity measure and a similarity threshold, the associated metadata of each of the multimedia objects are compared. Then, based on the comparison of the associated metadata of each of the at least two multimedia objects, the method further includes generating congruous metadata. Metadata may be tags, for example.
摘要:
Systems and methods are provided for automatically generating a thumbnail for a video on an online shopping site. The disclosed technology automatically generates a thumbnail for a video, where the thumbnail represents an item but not necessarily content of the video. A thumbnail generator receives a video that describes the item and an ordered list of item images associated with the item used in an item listing. The thumbnail generator extracts video frames from the video based on sampling rules and determines similarity scores for the sampled video frames. A similarity score indicates a degree of similarity between content of a video frame and an item image. The thumbnail generator determines weighted similarity scores based item images and occurrences of sampled video frames in the video. The disclosed technology generates a thumbnail for the video by selecting a sample video frame based on the weighted similarity scores.
摘要:
The overall architecture and details of a scalable video fingerprinting and identification system that is robust with respect to many classes of video distortions is described. In this system, a fingerprint for a piece of multimedia content is composed of a number of compact signatures, along with traversal hash signatures and associated metadata. Numerical descriptors are generated for features found in a multimedia clip, signatures are generated from these descriptors, and a reference signature database is constructed from these signatures. Query signatures are also generated for a query multimedia clip. These query signatures are searched against the reference database using a fast similarity search procedure, to produce a candidate list of matching signatures. This candidate list is further analyzed to find the most likely reference matches. Signature correlation is performed between the likely reference matches and the query clip to improve detection accuracy.
摘要:
The overall architecture and details of a scalable video fingerprinting and identification system that is robust with respect to many classes of video distortions is described. In this system, a fingerprint for a piece of multimedia content is composed of a number of compact signatures, along with traversal hash signatures and associated metadata. Numerical descriptors are generated for features found in a multimedia clip, signatures are generated from these descriptors, and a reference signature database is constructed from these signatures. Query signatures are also generated for a query multimedia clip. These query signatures are searched against the reference database using a fast similarity search procedure, to produce a candidate list of matching signatures. This candidate list is further analyzed to find the most likely reference matches. Signature correlation is performed between the likely reference matches and the query clip to improve detection accuracy.
摘要:
The overall architecture and details of a scalable video fingerprinting and identification system that is robust with respect to many classes of video distortions is described. In this system, a fingerprint for a piece of multimedia content is composed of a number of compact signatures, along with traversal hash signatures and associated metadata. Numerical descriptors are generated for features found in a multimedia clip, signatures are generated from these descriptors, and a reference signature database is constructed from these signatures. Query signatures are also generated for a query multimedia clip. These query signatures are searched against the reference database using a fast similarity search procedure, to produce a candidate list of matching signatures. This candidate list is further analyzed to find the most likely reference matches. Signature correlation is performed between the likely reference matches and the query clip to improve detection accuracy.
摘要:
Systems and methods for full motion video search are provided. In one aspect, a method includes receiving one or more search terms. The search terms include one or more of a characterization of the amount of man-made features in a video image and a characterization of the amount of natural features in the video image. The method further includes searching a full motion video database based on the one or more search terms.
摘要:
A system of this invention is a video processing system for outputting additional information to be added to a video content. This video processing system includes a frame feature extractor that extracts a frame feature of a frame included in an arbitrary video content, a video content extractor that extracts a video content group having a scene formed from a series of a plurality of frames in the arbitrary video content by comparing frame features of the arbitrary video content extracted by the frame feature extractor with frame features of another video contents, the video content group including an original video content with the scene unaltered and one or more derivative video contents with the scene altered, and an additional information extractor that extracts additional information added to the scene of the extracted video content group. With this arrangement, additional information added to a video content group including an identical scene can be referred to from one video content.
摘要:
Semantic indexing and retrieval of multimedia content requires that the content is sufficiently annotated. However, the great volumes of multimedia data and diversity of labels make annotation a difficult and costly process. Disclosed is an annotation framework in which supervised training with partially labeled data is facilitated using active learning. The system trains a classifier with a small set of labeled data and subsequently updates the classifier by selecting a subset of the available data-set according to optimization criteria. The process results in propagation of labels to unlabeled data and greatly facilitates the user in annotating large amounts of multimedia content.
摘要:
Physical objects, including still and moving images, sound/audio and text are transformed into more compact forms for identification and other purposes using a method unrelated to existing image-matching systems which rely on feature extraction. An auxiliary construct, preferably a warp grid, is associated with an object, and a series of transformations are imposed to generate a unique visual key for identification, comparisons, and other operations. Search methods are also disclosed for matching an unknown image to one previously represented in a visual key database. Broadly, a preferred search method sequentially examines candidate database images for their closeness of match in a sequential order determined by their a priori match probability. Thus, the most likely match candidate is examined first, the next most likely second, and so forth. With respect to the recognition of video sequences and other information streams, inventive holotropic stream recognition principles are deployed, wherein the statistics of the spatial distribution of warp grid points is used to generate index keys. The invention is applicable to various fields of endeavor, including governmental, scientific, industrial, commercial, and recreational object identification and information retrieval. Extensions of the technology are also disclosed to achieve a uniform distribution of objects over the database search, a consideration which is central to scalability. In particular, a generalized method has been developed based on reticle projection, which greatly enhances the uniformity of object distributions in the collected data Thus, whereas statistical criteria are used with respect to particular embodiments in transforming a construct associated with an image, audio, text or other representation, a reticle projection may alternatively be used in attribute transformation according to alternative embodiments of the invention.
摘要:
An image indexing system comprising a video frame store, an averager which provides a first signal indicative of the average brightness level of each frame stored, an image splitter and averager which divides each frame into contiguous blocks of pixels, and provides for each block, a second signal indicative of its average brightness level, a comparator which compares each of the second signals with the first signal so as to produce in respect of each block a binary signal indicative of whether or not its brightness level, as indicated by the first signal, is greater or less than the average brightness level for the frame as indicated by the first signal, thereby to produce for each frame, an index signal comprising one binary bit for each block which serves to identify each frame for indexing purposes.