PAIRWISE RANKING-BASED CLASSIFIER
    1.
    发明申请
    PAIRWISE RANKING-BASED CLASSIFIER 有权
    基于排序的分类器

    公开(公告)号:US20110099131A1

    公开(公告)日:2011-04-28

    申请号:US12603763

    申请日:2009-10-22

    IPC分类号: G06F15/18 G06N5/02

    CPC分类号: G06N99/005 G06F17/30707

    摘要: The present invention provides methods and systems for binary classification of items. Methods and systems are provided for constructing a machine learning-based and pairwise ranking method-based classification model for binary classification of items as positive or negative with regard to a single class, based on training using a training set of examples including positive examples and unlabelled examples. The model includes only one hyperparameter and only one threshold parameter, which are selected to optimize the model with regard to constraining positive items to be classified as positive while minimizing a number of unlabelled items classified as positive.

    摘要翻译: 本发明提供了用于项目二进制分类的方法和系统。 提供方法和系统,用于构建基于机器学习和成对排序方法的分类模型,对于单个类别的项目的二进制分类为正或负,基于使用包括正面示例和未标记的示例的训练集的训练 例子。 该模型仅包括一个超参数和仅一个阈值参数,其被选择以优化模型以限制正项目被分类为正,同时使被分类为阳性的未标记项目的数量最小化。

    Method and system for concept summarization

    公开(公告)号:US09870376B2

    公开(公告)日:2018-01-16

    申请号:US13077995

    申请日:2011-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30265

    摘要: A method and a system for summarizing a concept are provided. A query corresponding to a concept is received from a user. A plurality of images and corresponding descriptive information may be collected based on the query. The plurality of images and the descriptive information may be processed to form feature vectors and processed descriptive information respectively. Further, one or more topics may be identified for the plurality of images. Each of the plurality of images may be assigned with one or more topic distribution values corresponding to the one or more topics. The one or more topics correspond to the processed descriptive information. A sparse set of images may be determined based on the feature vectors and the assigned topic distribution values, to summarize the concept. Also, a target summary may be built from the summarized concept, by regularizing one or more distribution constraints.

    EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE TAXONOMY TEXT CLASSIFICATION
    3.
    发明申请
    EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE TAXONOMY TEXT CLASSIFICATION 审中-公开
    有效建立大型文本分类的紧凑型模型

    公开(公告)号:US20100161527A1

    公开(公告)日:2010-06-24

    申请号:US12342750

    申请日:2008-12-23

    IPC分类号: G06F15/18

    CPC分类号: G06F16/58 G06F16/51

    摘要: A taxonomy model is determined with a reduced number of weights. For example, the taxonomy model is a tangible representation of a hierarchy of nodes that represents a hierarchy of classes that, when labeled with a representation of a combination of weights, is usable to classify documents having known features but unknown class. For each node of the taxonomy, the training example documents are processed to determine the features for which there are a sufficient number of training example documents having a class label corresponding to at least one of the leaf nodes of a subtree having that node as a root node. For each node of the taxonomy, a sparse weight vector is determined for that node, including setting zero weights, for that node, those features determined to not appear at least a minimum number of times in a given set of leaf nodes in the sub-tree with that node as a root node. The sparse weight vectors can be learned by solving an optimization problem using a maximum entropy classifier, or a large margin classifier with a sequential dual method (SDM) with margin or slack resealing. The determined sparse weight vectors are tangibly embodied in a computer-readable medium in association with the tangible representation of the nodes of the taxonomy.

    摘要翻译: 用减少的权数确定分类模型。 例如,分类模型是代表层次结构的节点层次结构的有形表示,当用标号组合的权重标记可用于对具有已知特征但未知类的文档进行分类时。 对于分类法的每个节点,处理训练示例文档以确定具有足够数量的训练示例文档的特征,所述训练示例文档具有对应于具有该节点的子树的至少一个叶节点作为根的类标签 节点。 对于分类法的每个节点,为该节点确定该节点的稀疏权重向量,包括为该节点设置零权重,确定该子节点中给定的一组叶节点中至少不存在最少次数的那些特征, 树与该节点作为根节点。 可以通过使用最大熵分类器或具有边缘或松弛重新密度的顺序双重方法(SDM)的大余量分类器来求解优化问题来学习稀疏权重向量。 所确定的稀疏权重向量与计算机可读介质中的有形表示相结合,与分类法的节点的有形表示相关联。

    LARGE SCALE ENTITY-SPECIFIC RESOURCE CLASSIFICATION
    4.
    发明申请
    LARGE SCALE ENTITY-SPECIFIC RESOURCE CLASSIFICATION 有权
    大规模实体特定资源分类

    公开(公告)号:US20110264651A1

    公开(公告)日:2011-10-27

    申请号:US12764694

    申请日:2010-04-21

    IPC分类号: G06F17/30 G06F15/18

    CPC分类号: G06F17/30867

    摘要: A system and method is described for large scale entity-specific classification of each entity-specific set of candidates in a collection of candidates for each specific entity in a collection of entities. The collection of entities may comprise a specific category or domain of entities (e.g. schools, restaurants, manufacturers, products, events, people). Candidates may comprise webpages or other resources with resource identifiers. Entity specific sets of candidates may be found by leveraging search engine query results and user interaction therewith for queries based on entity-specific attributes. The relationship(s) or class(es) for which candidate resources are being classified relative to a specific entity may comprise an authoritative, official home page (OHP), or other class (e.g. fan page, review, aggregator) relative to a specific entity. A feature generator generates entity-specific features for candidates. In accordance with its features, one or more classifiers rank each candidate for a specific class for a specific entity.

    摘要翻译: 描述了用于在实体集合中的每个特定实体的候选集合中的每个实体特定的候选者集合的大规模实体特定分类的系统和方法。 实体的收集可以包括实体(例如学校,餐馆,制造商,产品,事件,人)的特定类别或领域。 候选人可以包括具有资源标识符的网页或其他资源。 可以通过利用搜索引擎查询结果和与其进行用户交互来查找基于实体特定属性的查询来找到实体特定的候选者集合。 候选资源相对于特定实体被分类的关系或类可以包括权威的官方主页(OHP)或相对于特定实体的其他类(例如,粉丝专页,评论,聚合者) 实体。 特征生成器为候选者生成实体特定的特征。 根据其特征,一个或多个分类器为特定实体的特定类别的每个候选者排名。

    PREDICTIVE GAUSSIAN PROCESS CLASSIFICATION WITH REDUCED COMPLEXITY
    5.
    发明申请
    PREDICTIVE GAUSSIAN PROCESS CLASSIFICATION WITH REDUCED COMPLEXITY 审中-公开
    具有降低复杂度的预测性GAUSSIAN过程分类

    公开(公告)号:US20100161534A1

    公开(公告)日:2010-06-24

    申请号:US12338098

    申请日:2008-12-18

    IPC分类号: G06N5/02

    CPC分类号: G06N20/00

    摘要: A computer-implemented method of generating a model of a sparse GP classifier includes performing basis vector selection and adding a thus-selected basis vector to a basis vector set, including performing a margin-based method that accounts for predictive mean and variance associated with all the candidate basis vectors at that iteration. Hyperparameter optimization is performed. The basis vector selection step and hyperparameter optimization step are such that the steps are alternately performed until a specified termination criteria is met. The selected basis vectors and optimized hyperparameters are stored in at least one tangible computer readable medium organized in a manner to be usable as the model of the sparse GP classifier.In one example, the basis vector selection includes use of an adaptive-sampling technique that accounts for probability characteristics associated with the candidate basis vectors. Performing the hyperparameter optimization and/or basis vector selection using the adaptive sampling technique may include considering a weighted negative-log predictive (NLP) loss measure for each example.

    摘要翻译: 生成稀疏GP分类器的模型的计算机实现的方法包括执行基向量选择并将如此选择的基本向量添加到基向量集合,包括执行基于边缘的方法,其考虑与所有相关联的预测均值和方差 在该迭代的候选基向量。 执行超参数优化。 基本向量选择步骤和超参数优化步骤使得这些步骤交替执行,直到满足指定的终止标准。 所选择的基本向量和优化的超参数被存储在以可用作稀疏GP分类器的模型的方式组织的至少一个有形计算机可读介质中。 在一个示例中,基矢量选择包括使用自适应采样技术,其考虑与候选基矢量相关联的概率特征。 使用自适应采样技术执行超参数优化和/或基向量选择可以包括考虑每个示例的加权负对数预测(NLP)损耗测量。

    Large scale entity-specific resource classification
    6.
    发明授权
    Large scale entity-specific resource classification 有权
    大规模实体专有资源分类

    公开(公告)号:US09317613B2

    公开(公告)日:2016-04-19

    申请号:US12764694

    申请日:2010-04-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: A system and method is described for large scale entity-specific classification of each entity-specific set of candidates in a collection of candidates for each specific entity in a collection of entities. The collection of entities may comprise a specific category or domain of entities (e.g. schools, restaurants, manufacturers, products, events, people). Candidates may comprise webpages or other resources with resource identifiers. Entity specific sets of candidates may be found by leveraging search engine query results and user interaction therewith for queries based on entity-specific attributes. The relationship(s) or class(es) for which candidate resources are being classified relative to a specific entity may comprise an authoritative, official home page (OHP), or other class (e.g. fan page, review, aggregator) relative to a specific entity. A feature generator generates entity-specific features for candidates. In accordance with its features, one or more classifiers rank each candidate for a specific class for a specific entity.

    摘要翻译: 描述了用于在实体集合中的每个特定实体的候选集合中的每个实体特定的候选者集合的大规模实体特定分类的系统和方法。 实体的收集可以包括实体(例如学校,餐馆,制造商,产品,事件,人)的特定类别或领域。 候选人可以包括具有资源标识符的网页或其他资源。 可以通过利用搜索引擎查询结果和与其进行用户交互来查找基于实体特定属性的查询来找到实体特定的候选者集合。 候选资源相对于特定实体被分类的关系或类可以包括权威的官方主页(OHP)或相对于特定实体的其他类(例如,粉丝专页,评论,聚合者) 实体。 特征生成器为候选者生成实体特定的特征。 根据其特征,一个或多个分类器为特定实体的特定类别的每个候选者排名。

    System and method for generating a classifier model for classifying web content
    7.
    发明授权
    System and method for generating a classifier model for classifying web content 有权
    用于生成用于分类网页内容的分类器模型的系统和方法

    公开(公告)号:US07949622B2

    公开(公告)日:2011-05-24

    申请号:US11955965

    申请日:2007-12-13

    IPC分类号: G06F15/18 G06N5/00

    CPC分类号: G06N99/005

    摘要: Generally, the present invention provides a method and computerized system for generating a classifier model, wherein the classifier model is operative to classify web content. The method and computerized system includes a first step of defining a plurality of predictive performance measures based on a leave one out (LOO) cross validation in terms of selectable model parameters. Exemplary predictive performance measures includes smoothened predictive measures such as F-measure, weighted error rate measure, area under curve measure, by way of example. The method and computerized system further includes deriving efficient analytical expressions for predictive performance measures to compute the LOO predictive performance and their derivatives. The next step is thereupon selecting a classifier model based on the LOO predictive performance.

    摘要翻译: 通常,本发明提供了一种用于生成分类器模型的方法和计算机化系统,其中分类器模型可操作以对web内容进行分类。 该方法和计算机系统包括第一步骤,根据可选择的模型参数,基于离开(LOO)交叉验证来定义多个预测性能测量。 示例性预测性能测量包括通过示例的平滑预测测量,例如F测量,加权误差率测量,曲线下测量范围。 该方法和计算机化系统还包括为预测性能测量提供有效的分析表达式,以计算LOO预测性能及其衍生物。 随后选择基于LOO预测性能的分类器模型。

    METHOD FOR EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE MULTI-CLASS TEXT CLASSIFICATION
    8.
    发明申请
    METHOD FOR EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE MULTI-CLASS TEXT CLASSIFICATION 审中-公开
    用于大型多类文本分类的高效建模方法

    公开(公告)号:US20090274376A1

    公开(公告)日:2009-11-05

    申请号:US12115486

    申请日:2008-05-05

    IPC分类号: G06K9/62

    CPC分类号: G06K9/6269 G06K9/00442

    摘要: A method of classifying documents includes: specifying multiple documents and classes, wherein each document includes a plurality of features and each document corresponds to one of the classes; determining reduced document vectors for the classes from the documents, wherein the reduced document vectors include features that satisfy threshold conditions corresponding to the classes; determining reduced weight vectors for relating the documents to the classes by comparing combinations of the reduced weight vectors and the reduced document vectors and separating the corresponding classes; and saving one or more values for the reduced weight vectors and the classes. Specific embodiments are directed to formulations for determining the reduced weight vectors including one-versus-rest classifiers, maximum entropy classifiers, and direct multiclass Support Vector Machines.

    摘要翻译: 分类文件的方法包括:指定多个文档和类,其中每个文档包括多个特征,并且每个文档对应于其中一个类; 从所述文档确定所述类的缩小的文档向量,其中所述缩小的文档向量包括满足与所述类别对应的阈值条件的特征; 通过比较缩小权重向量和简化文档向量的组合并分离相应的类别来确定用于将文档与类相关联的减小权重向量; 并为减小的权重向量和类别保存一个或多个值。 具体实施方案涉及用于确定减重权重向量的配方,包括一对休息分类器,最大熵分类器和直接多类支持向量机。

    System and method for sparse gaussian process regression using predictive measures
    9.
    发明申请
    System and method for sparse gaussian process regression using predictive measures 审中-公开
    使用预测措施进行稀疏高斯过程回归的系统和方法

    公开(公告)号:US20090150126A1

    公开(公告)日:2009-06-11

    申请号:US12001958

    申请日:2007-12-10

    IPC分类号: G06N7/00

    CPC分类号: G06N7/005

    摘要: An improved system and method is provided for sparse Gaussian process regression using predictive measures. A Gaussian process regressor model may be construction by interleaving basis vector set selection and hyper-parameter optimization until the chosen predictive measure stabilizes. One of various LOO-CV based predictive measures may be used to find an optimal set of active basis vectors for building a sparse Gaussian process regression model by sequentially adding basis vectors selected using a chosen predictive measure. In a given iteration, a predictive measure is computed for each of the basis vectors in a candidate set of basis vectors and the basis vector with the best predictive measure is selected. The iterative addition of basis vectors may stop when predictive performance of the model degrades or no significant performance improvement is seen.

    摘要翻译: 使用预测措施为稀疏高斯过程回归提供了改进的系统和方法。 高斯过程回归模型可以通过交织基向量集选择和超参数优化来构建,直到所选择的预测量度稳定。 可以使用各种基于LOO-CV的预测度量中的一种来找到用于通过依次添加使用选择的预测度量选择的基本向量来构建稀疏高斯过程回归模型的最优的一组活动基向量。 在给定的迭代中,针对基本向量的候选集合中的每个基本向量计算预测度量,并且选择具有最佳预测度量的基本向量。 当模型的预测性能下降或没有看到显着的性能提升时,基本向量的迭代加法可能会停止。

    METHOD AND SYSTEM FOR CONCEPT SUMARIZATION
    10.
    发明申请
    METHOD AND SYSTEM FOR CONCEPT SUMARIZATION 有权
    方法和系统的概念配套

    公开(公告)号:US20120254191A1

    公开(公告)日:2012-10-04

    申请号:US13077995

    申请日:2011-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30265

    摘要: A method and a system for summarizing a concept are provided. A query corresponding to a concept is received from a user. A plurality of images and corresponding descriptive information may be collected based on the query. The plurality of images and the descriptive information may be processed to form feature vectors and processed descriptive information respectively. Further, one or more topics may be identified for the plurality of images. Each of the plurality of images may be assigned with one or more topic distribution values corresponding to the one or more topics. The one or more topics correspond to the processed descriptive information. A sparse set of images may be determined based on the feature vectors and the assigned topic distribution values, to summarize the concept. Also, a target summary may be built from the summarized concept, by regularizing one or more distribution constraints.

    摘要翻译: 提供了概括概念的方法和系统。 从用户接收对应于概念的查询。 可以基于查询来收集多个图像和对应的描述信息。 可以处理多个图像和描述信息以分别形成特征向量和处理的描述信息。 此外,可以为多个图像识别一个或多个主题。 可以为多个图像中的每一个分配与一个或多个主题对应的一个或多个主题分布值。 一个或多个主题对应于处理的描述信息。 可以基于特征向量和分配的主题分布值来确定稀疏图像集合,以概括概念。 此外,可以从总结概念构建目标摘要,通过规范一个或多个分配约束。