Abstract:
An identity verification method and an identity verification apparatus based on a voiceprint are provided. The identity verification method based on a voiceprint includes: receiving an unknown voice; extracting a voiceprint of the unknown voice using a neural network-based voiceprint extractor which is obtained through pre-training; concatenating the extracted voiceprint with a pre-stored voiceprint to obtain a concatenated voiceprint; and performing judgment on the concatenated voiceprint using a pre-trained classification model, to verify whether the extracted voiceprint and the pre-stored voiceprint are from a same person. With the identity verification method and the identity verification apparatus, a holographic voiceprint of the speaker can be extracted from a short voice segment, such that the verification result is more robust.
Abstract:
An image similarity determining device and method and an image feature acquiring device and method are provided. The image similarity determining device comprises a preprocessing unit for extracting feature points of each input image region of an input image and each image region to be matched of a data source image; a matched feature point set determining unit for determining one to one matched feature point pairs between input image regions and image regions to be matched to determine matched feature point sets; a geometry similarity determining unit for determining a geometry similarity between the input image region and the image region to be matched based on distribution of respective feature points in the matched feature point sets; and an image similarity determining unit for determining similarity between input image and data source image based on geometry similarities between input image regions and corresponding image regions to be matched.
Abstract:
Embodiments describe an image retrieval apparatus. The image retrieval apparatus includes an unlabelled image selector for selecting one or more unlabelled image(s) from an image database; and a main learner for training in each feedback round of the image retrieval, estimating relevance of images in the image database and a user's intention, and determining retrieval results, wherein the main learner makes use of the unlabelled image(s) selected by the unlabelled image selector in the estimation. In addition, the image retrieval apparatus may also include an active selector for selecting, in each feedback round and according to estimation results of the main learner, one or more unlabelled image(s) from the image database for the user to label.
Abstract:
The present disclosure relates to a method, device and storage medium for improving multi-object tracking. According to an embodiment, the method comprises: performing a split operation on a tracklet provided for one object by a multi-object tracking model. The split operation comprises: determining an appearance feature sequence of the tracklet; determining a clustering label set of the appearance feature sequence; determining an image block label sequence; determining a fragment label sequence corresponding to continuous fragments, having the same clustering labels, in the image block label sequence; in a case where a length of the fragment label sequence is greater than the number of types of the clustering labels in the clustering label set, updating the image block label sequence and the fragment label sequence by performing an update operation; and splitting the tracklet based on the updated image block label sequence. The method may further comprise a merge operation.
Abstract:
The embodiments of the present disclosure provide an apparatus for identifying items, a method for identifying items and an electronic device. The apparatus includes: a detector configured to detect one or more items in a reference area in one or more image frames in video data; a tracker configured to track an item detected in multiple image frames, wherein multi-hierarchy decision is performed on the item in the multiple image frames by using different time windows; and a classifier configured to identify the item according to a decision result of the tracker. Thereby, even if an item is moved briefly in some scenarios, the item will not be identified as two different items, which can reduce a situation in which the item is identified repeatedly and improve accuracy and robustness of item detection.
Abstract:
A method of training a model, a device of training a model, and an information processing method is provided. The method of training a model comprises: determining a subsample set sequence composed of N subsample sets of a total training sample set; and iteratively training the model in sequence of N stages based on the subsample set sequence; wherein a stage training sample set of a y-th stage from a second stage to a N-th stage of the N stages comprises a y-th subsample set in the subsample set sequence and a downsampled pre-subsample set of a pre-subsample set composed of all subsample sets before the y-th subsample set; and each single class sample quantity of the downsampled pre-subsample set is close to or falls into a single class sample quantity distribution interval of the y-th subsample set.
Abstract:
A device and a method for classification using a pre-trained classification model and a computer readable storage medium are provided. The device is configured to extract, for each of multiple images in a target image group to be classified, a feature of the image using a feature extraction layer of the pre-trained classification model; calculate, for each of the multiple images, a contribution of the image to a classification result of the target image group using a contribution calculation layer of the pre-trained classification model; aggregate extracted features of the multiple images based on calculated contributions of the multiple images, to obtain an aggregated feature as a feature of the target image group; and classify the target image group based on the feature of the target image group.
Abstract:
A method and a device for detecting a hand action are provided. The method includes: identifying an area including hands of a person in one frame image of a video; dividing the area into multiple blocks and calculating a motion vector for each of the blocks; clustering multiple resulted motion vectors into a first cluster and a second cluster, wherein multiple first blocks corresponding to the first cluster of motion vectors correspond to one of a left hand and a right hand, and multiple second blocks corresponding to the second cluster of motion vectors correspond to the other one of the left hand and the right hand; identifying movements of the hands to which the first cluster and the second cluster correspond in a frame image subsequent to the one frame image; and matching the identified movements with a predetermined action mode to determine an action of the hands.
Abstract:
An information processing method includes: inputting sample image into a machine learning architecture to obtain a first feature, and causing a first classifier to calculate a first classification loss; calculating a second feature based on the first feature and a predetermined first mask, and inputting the second feature into the first classifier to calculate an entropy loss; calculating a second mask based on the first mask and the entropy loss to maximize the entropy loss; obtaining an adversarial feature based on the first feature and the second mask, where the adversarial feature is complementary to the second feature; causing, by training the first classifier and the second classifier in association with each other, the second classifier to calculate a second classification loss based on the adversarial feature; and adjusting parameters of the machine learning architecture, the first classifier and the second classifier, to obtain a trained machine learning architecture.
Abstract:
Embodiments provide a multimodality-based image tagging apparatus and a method for the same. The image tagging apparatus includes: a score generating unit configured to generate, for an inquiry image, multiple groups of first scores about all tags in an tagging dictionary by using a training image and multiple modalities of an image; a late-fusion unit configured to fuse the obtained multiple groups of scores to obtain final scores about all the tags; and a tag selecting unit configured to select one or more tag(s) with relatively large tag scores as tag(s) of the inquiry image according to the final scores about all the tags. With the embodiments, multiple modalities may be effectively fused, and a more robust and accurate image tagging result may be obtained.