摘要:
The present invention provides methods and systems for binary classification of items. Methods and systems are provided for constructing a machine learning-based and pairwise ranking method-based classification model for binary classification of items as positive or negative with regard to a single class, based on training using a training set of examples including positive examples and unlabelled examples. The model includes only one hyperparameter and only one threshold parameter, which are selected to optimize the model with regard to constraining positive items to be classified as positive while minimizing a number of unlabelled items classified as positive.
摘要:
A method and a system for summarizing a concept are provided. A query corresponding to a concept is received from a user. A plurality of images and corresponding descriptive information may be collected based on the query. The plurality of images and the descriptive information may be processed to form feature vectors and processed descriptive information respectively. Further, one or more topics may be identified for the plurality of images. Each of the plurality of images may be assigned with one or more topic distribution values corresponding to the one or more topics. The one or more topics correspond to the processed descriptive information. A sparse set of images may be determined based on the feature vectors and the assigned topic distribution values, to summarize the concept. Also, a target summary may be built from the summarized concept, by regularizing one or more distribution constraints.
摘要:
A taxonomy model is determined with a reduced number of weights. For example, the taxonomy model is a tangible representation of a hierarchy of nodes that represents a hierarchy of classes that, when labeled with a representation of a combination of weights, is usable to classify documents having known features but unknown class. For each node of the taxonomy, the training example documents are processed to determine the features for which there are a sufficient number of training example documents having a class label corresponding to at least one of the leaf nodes of a subtree having that node as a root node. For each node of the taxonomy, a sparse weight vector is determined for that node, including setting zero weights, for that node, those features determined to not appear at least a minimum number of times in a given set of leaf nodes in the sub-tree with that node as a root node. The sparse weight vectors can be learned by solving an optimization problem using a maximum entropy classifier, or a large margin classifier with a sequential dual method (SDM) with margin or slack resealing. The determined sparse weight vectors are tangibly embodied in a computer-readable medium in association with the tangible representation of the nodes of the taxonomy.
摘要:
A system and method is described for large scale entity-specific classification of each entity-specific set of candidates in a collection of candidates for each specific entity in a collection of entities. The collection of entities may comprise a specific category or domain of entities (e.g. schools, restaurants, manufacturers, products, events, people). Candidates may comprise webpages or other resources with resource identifiers. Entity specific sets of candidates may be found by leveraging search engine query results and user interaction therewith for queries based on entity-specific attributes. The relationship(s) or class(es) for which candidate resources are being classified relative to a specific entity may comprise an authoritative, official home page (OHP), or other class (e.g. fan page, review, aggregator) relative to a specific entity. A feature generator generates entity-specific features for candidates. In accordance with its features, one or more classifiers rank each candidate for a specific class for a specific entity.
摘要:
A computer-implemented method of generating a model of a sparse GP classifier includes performing basis vector selection and adding a thus-selected basis vector to a basis vector set, including performing a margin-based method that accounts for predictive mean and variance associated with all the candidate basis vectors at that iteration. Hyperparameter optimization is performed. The basis vector selection step and hyperparameter optimization step are such that the steps are alternately performed until a specified termination criteria is met. The selected basis vectors and optimized hyperparameters are stored in at least one tangible computer readable medium organized in a manner to be usable as the model of the sparse GP classifier.In one example, the basis vector selection includes use of an adaptive-sampling technique that accounts for probability characteristics associated with the candidate basis vectors. Performing the hyperparameter optimization and/or basis vector selection using the adaptive sampling technique may include considering a weighted negative-log predictive (NLP) loss measure for each example.
摘要:
A system and method is described for large scale entity-specific classification of each entity-specific set of candidates in a collection of candidates for each specific entity in a collection of entities. The collection of entities may comprise a specific category or domain of entities (e.g. schools, restaurants, manufacturers, products, events, people). Candidates may comprise webpages or other resources with resource identifiers. Entity specific sets of candidates may be found by leveraging search engine query results and user interaction therewith for queries based on entity-specific attributes. The relationship(s) or class(es) for which candidate resources are being classified relative to a specific entity may comprise an authoritative, official home page (OHP), or other class (e.g. fan page, review, aggregator) relative to a specific entity. A feature generator generates entity-specific features for candidates. In accordance with its features, one or more classifiers rank each candidate for a specific class for a specific entity.
摘要:
Generally, the present invention provides a method and computerized system for generating a classifier model, wherein the classifier model is operative to classify web content. The method and computerized system includes a first step of defining a plurality of predictive performance measures based on a leave one out (LOO) cross validation in terms of selectable model parameters. Exemplary predictive performance measures includes smoothened predictive measures such as F-measure, weighted error rate measure, area under curve measure, by way of example. The method and computerized system further includes deriving efficient analytical expressions for predictive performance measures to compute the LOO predictive performance and their derivatives. The next step is thereupon selecting a classifier model based on the LOO predictive performance.
摘要:
A method of classifying documents includes: specifying multiple documents and classes, wherein each document includes a plurality of features and each document corresponds to one of the classes; determining reduced document vectors for the classes from the documents, wherein the reduced document vectors include features that satisfy threshold conditions corresponding to the classes; determining reduced weight vectors for relating the documents to the classes by comparing combinations of the reduced weight vectors and the reduced document vectors and separating the corresponding classes; and saving one or more values for the reduced weight vectors and the classes. Specific embodiments are directed to formulations for determining the reduced weight vectors including one-versus-rest classifiers, maximum entropy classifiers, and direct multiclass Support Vector Machines.
摘要:
An improved system and method is provided for sparse Gaussian process regression using predictive measures. A Gaussian process regressor model may be construction by interleaving basis vector set selection and hyper-parameter optimization until the chosen predictive measure stabilizes. One of various LOO-CV based predictive measures may be used to find an optimal set of active basis vectors for building a sparse Gaussian process regression model by sequentially adding basis vectors selected using a chosen predictive measure. In a given iteration, a predictive measure is computed for each of the basis vectors in a candidate set of basis vectors and the basis vector with the best predictive measure is selected. The iterative addition of basis vectors may stop when predictive performance of the model degrades or no significant performance improvement is seen.
摘要:
A method and a system for summarizing a concept are provided. A query corresponding to a concept is received from a user. A plurality of images and corresponding descriptive information may be collected based on the query. The plurality of images and the descriptive information may be processed to form feature vectors and processed descriptive information respectively. Further, one or more topics may be identified for the plurality of images. Each of the plurality of images may be assigned with one or more topic distribution values corresponding to the one or more topics. The one or more topics correspond to the processed descriptive information. A sparse set of images may be determined based on the feature vectors and the assigned topic distribution values, to summarize the concept. Also, a target summary may be built from the summarized concept, by regularizing one or more distribution constraints.