Kernels for identifying patterns in datasets containing noise or transformation invariances
    1.
    发明授权
    Kernels for identifying patterns in datasets containing noise or transformation invariances 有权
    用于识别包含噪声或转换不变性的数据集中的模式的内核

    公开(公告)号:US08209269B2

    公开(公告)日:2012-06-26

    申请号:US12868658

    申请日:2010-08-25

    IPC分类号: G06F15/18 G06F17/00 G06N5/00

    摘要: Learning machines, such as support vector machines, are used to analyze datasets to recognize patterns within the dataset using kernels that are selected according to the nature of the data to be analyzed. Where the datasets include an invariance transformation or noise, tangent vectors are defined to identify relationships between the invariance or noise and the training data points. A covariance matrix is formed using the tangent vectors, then used in generation of the kernel, which may be based on a kernel PCA map.

    摘要翻译: 使用学习机器(如支持向量机)分析数据集,以使用根据要分析的数据的性质选择的内核来识别数据集中的模式。 在数据集包括不变性变换或噪声的情况下,定义向量以识别不变性或噪声与训练数据点之间的关系。 使用正切向量形成协方差矩阵,然后用于生成内核,其可以基于内核PCA映射。

    System and method for training a multi-class support vector machine to select a common subset of features for classifying objects
    2.
    发明授权
    System and method for training a multi-class support vector machine to select a common subset of features for classifying objects 有权
    用于训练多类支持向量机的系统和方法,以选择用于分类对象的特征的公共子集

    公开(公告)号:US07836000B2

    公开(公告)日:2010-11-16

    申请号:US12001932

    申请日:2007-12-10

    CPC分类号: G06K9/6249 G06K9/6269

    摘要: An improved system and method is provided for training a multi-class support vector machine to select a common subset of features for classifying objects. A multi-class support vector machine generator may be provided for learning classification functions to classify sets of objects into classes and may include a sparse support vector machine modeling engine for training a multi-class support vector machine using scaling factors by simultaneously selecting a common subset of features iteratively for all classes from sets of features representing each of the classes. An objective function using scaling factors to ensure sparsity of features may be iteratively minimized, and features may be retained and added until a small set of features stabilizes. Alternatively, a common subset of features may be found by iteratively removing at least one feature simultaneously for all classes from an active set of features initialized to represent the entire set of training features.

    摘要翻译: 提供了一种改进的系统和方法,用于训练多类支持向量机以选择用于分类对象的特征的公共子集。 可以提供多类支持向量机生成器用于学习分类功能以将对象集合分类到类中,并且可以包括稀疏支持向量机建模引擎,用于使用缩放因子来同时选择公共子集来训练多类支持向量机 的特征迭代地为表示每个类的特征的集合的所有类。 使用缩放因子以确保特征的稀疏性的目标函数可以被迭代地最小化,并且可以保留和添加特征,直到一小组特征稳定。 或者,可以通过从被初始化为表示整套训练特征的活动特征集合中的所有类别同时迭代地去除至少一个特征来发现特征的公共子集。

    Selection of features predictive of biological conditions using protein mass spectrographic data
    3.
    发明授权
    Selection of features predictive of biological conditions using protein mass spectrographic data 失效
    使用蛋白质质谱数据选择预测生物条件的特征

    公开(公告)号:US07676442B2

    公开(公告)日:2010-03-09

    申请号:US11929169

    申请日:2007-10-30

    IPC分类号: G06N5/00

    摘要: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

    摘要翻译: 支持向量机用于对包含在结构化数据集中的数据进行分类,例如由频谱分析仪产生的多个信号。 信号被预处理,以确保谱峰的峰对准。 构建相似性度量以提供用于比较信号样本对的基础。 训练支持向量机以区分不同类别的样本。 以识别光谱中最具预测性的特征。 在优选实施例中,执行特征选择以减少必须考虑的特征的数量。

    GRADIENT BASED OPTIMIZATION OF A RANKING MEASURE
    4.
    发明申请
    GRADIENT BASED OPTIMIZATION OF A RANKING MEASURE 有权
    排名测度的梯度优化

    公开(公告)号:US20090089274A1

    公开(公告)日:2009-04-02

    申请号:US11863453

    申请日:2007-09-28

    申请人: Olivier Chapelle

    发明人: Olivier Chapelle

    IPC分类号: G06F7/00 G06F17/15

    CPC分类号: G06F17/30675 G06F17/30864

    摘要: Methods, systems, and apparatuses for generating relevance functions for ranking documents obtained in searches are provided. One or more features to be used as predictor variables in the construction of a relevance function are determined. The relevance function is parameterized by one or more coefficients. A query error is defined that measures a difference between a relevance ranking generated by the relevance function and a training set relevance ranking based on a query and a set of scored documents associated with the query. The query error is a continuous function of the coefficients and aims at approximating errors measures commonly used in Information Retrieval. Values for the coefficients of the relevance function are determined that substantially minimize an objective function that depends on the defined query error.

    摘要翻译: 提供了用于产生用于对在搜索中获得的文档进行排序的相关性功能的方法,系统和装置。 确定在构建相关函数中用作预测变量的一个或多个特征。 相关函数由一个或多个系数参数化。 定义了一种查询错误,其测量相关性功能产生的相关性排名与基于查询的一组训练集相关性排序与与该查询相关联的一组计分文档之间的差异。 查询错误是系数的连续函数,旨在近似信息检索中常用的错误度量。 确定相关函数的系数的值,其基本上最小化取决于定义的查询错误的目标函数。

    KERNELS AND METHODS FOR SELECTING KERNELS FOR USE IN LEARNING MACHINES
    5.
    发明申请
    KERNELS AND METHODS FOR SELECTING KERNELS FOR USE IN LEARNING MACHINES 失效
    选择用于学习机器的KERNELS的知识和方法

    公开(公告)号:US20080301070A1

    公开(公告)日:2008-12-04

    申请号:US11929354

    申请日:2007-10-30

    IPC分类号: G06F15/18

    摘要: Learning machines, such as support vector machines, are used to analyze datasets to recognize patterns within the dataset using kernels that are selected according to the nature of the data to be analyzed. Where the datasets possesses structural characteristics, locational kernels can be utilized to provide measures of similarity among data points within the dataset. The locational kernels are then combined to generate a decision function, or kernel, that can be used to analyze the dataset. Where an invariance transformation or noise is present, tangent vectors are defined to identify relationships between the invariance or noise and the data points. A covariance matrix is formed using the tangent vectors, then used in generation of the kernel.

    摘要翻译: 使用学习机器(如支持向量机)分析数据集,以使用根据要分析的数据的性质选择的内核来识别数据集中的模式。 在数据集具有结构特征的情况下,可以利用位置内核提供数据集中的数据点之间的相似度度量。 然后组合位置内核以生成可用于分析数据集的决策函数或内核。 在存在不变变换或噪声的情况下,定义向量以识别不变性或噪声与数据点之间的关系。 使用切向矢量形成协方差矩阵,然后用于生成内核。

    Method and system for distributed machine learning

    公开(公告)号:US09633315B2

    公开(公告)日:2017-04-25

    申请号:US13458545

    申请日:2012-04-27

    IPC分类号: G06N99/00 G06F15/18

    CPC分类号: G06N99/005 G06F15/18

    摘要: Method, system, and programs for distributed machine learning on a cluster including a plurality of nodes are disclosed. A machine learning process is performed in each of the plurality of nodes based on a respective subset of training data to calculate a local parameter. The training data is partitioned over the plurality of nodes. A plurality of operation nodes are determined from the plurality of nodes based on a status of the machine learning process performed in each of the plurality of nodes. The plurality of operation nodes are connected to form a network topology. An aggregated parameter is generated by merging local parameters calculated in each of the plurality of operation nodes in accordance with the network topology.

    CLICK MODEL FOR SEARCH RANKINGS
    7.
    发明申请
    CLICK MODEL FOR SEARCH RANKINGS 有权
    点击模式搜索排名

    公开(公告)号:US20100125570A1

    公开(公告)日:2010-05-20

    申请号:US12273425

    申请日:2008-11-18

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Approaches and techniques are discussed for ranking the documents indicated in search results for a query based on click-through information collected for the query in previous query sessions. According to an embodiment of the invention, when calculating a relevance score for a particular document, one may overcome positional bias by utilizing click-through information about other documents previously returned in the same search results as the particular document. According to an embodiment, one may utilize Dynamic Bayesian Network, based on said click-through information, to model relevance. According to an embodiment of the invention, one may utilize click-through information to generate targets for learning a ranking function.

    摘要翻译: 讨论方法和技术,用于根据在以前的查询会话中为查询收集的点击信息对查询的搜索结果中指示的文档进行排名。 根据本发明的实施例,当计算特定文档的相关性得分时,可以通过利用与特定文档相同的搜索结果中先前返回的其他文档的点击信息来克服位置偏差。 根据实施例,可以基于所述点击信息来利用动态贝叶斯网络来模拟相关性。 根据本发明的实施例,可以利用点击信息来生成用于学习排名功能的目标。

    Hierarchical Recognition Through Semantic Embedding
    8.
    发明申请
    Hierarchical Recognition Through Semantic Embedding 审中-公开
    通过语义嵌入的层次识别

    公开(公告)号:US20090271339A1

    公开(公告)日:2009-10-29

    申请号:US12111500

    申请日:2008-04-29

    IPC分类号: G06F15/18

    CPC分类号: G06N20/00

    摘要: Computer-implemented systems and methods, including servers, perform structure-based recognition processes that include matching and classification. Preprocessing subsystems and sub-methods embed a set of classes on which a loss function is defined into a semantic space and learn an input mapping between an input space and the semantic space. Recognition subsystems and methods accept a test object, representable in the input space, and apply the input mapping to the test object as part of a recognition process.

    摘要翻译: 计算机实现的系统和方法,包括服务器,执行基于结构的识别过程,包括匹配和分类。 预处理子系统和子方法将一组将损失函数定义到一个语义空间中的类进行嵌入,并学习输入空间和语义空间之间的输入映射。 识别子系统和方法接受在输入空间中表示的测试对象,并将输入映射应用于测试对象作为识别过程的一部分。

    Data Mining Unlearnable Data Sets
    9.
    发明申请
    Data Mining Unlearnable Data Sets 审中-公开
    数据挖掘不可靠的数据集

    公开(公告)号:US20080027886A1

    公开(公告)日:2008-01-31

    申请号:US11572193

    申请日:2005-07-18

    IPC分类号: G06G7/00

    摘要: This invention concerns data mining, that is the extraction of information, from “unlearnable” data sets. In particular it concerns apparatus and a method for this purpose. The invention involves creating a finite training sample from the data set (14). Then training (50) a learning device (32) using a supervised learning algorithm to predict labels for each item of the training sample. Then processing other data from the data set with the trained learning device to predict labels and determining whether the predicted labels are better (learnable) or worse (anti-learnable) than random guessing (52). And, using a reverser (34) to apply negative weighting to the predicted labels if it is worse (anti-learnable) (54).

    摘要翻译: 本发明涉及数据挖掘,即从“不可理解”的数据集中提取信息。 特别地,它涉及用于此目的的装置和方法。 本发明涉及从数据集(14)创建有限训练样本。 然后使用监督学习算法训练(50)学习装置(32)来预测训练样本的每个项目的标签。 然后利用训练有素的学习装置处理来自数据集的其他数据,以预测标签,并确定预测标签是否比随机猜测更好(可学习)或更差(可反学习)(52)。 并且,如果反转(34)更糟(反学习),则使用反向器(34)对预测标签应用负权重(54)。

    GLOBAL AND TOPICAL RANKING OF SEARCH RESULTS USING USER CLICKS
    10.
    发明申请
    GLOBAL AND TOPICAL RANKING OF SEARCH RESULTS USING USER CLICKS 审中-公开
    使用用户点击搜索结果的全球和主题排名

    公开(公告)号:US20110029517A1

    公开(公告)日:2011-02-03

    申请号:US12533564

    申请日:2009-07-31

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: To estimate, or predict, the relevance of items, or documents, in a set of search results, relevance information is extracted from user click data, and relational information among the documents as manifested by an aggregation of user clicks is determined from the click data. A supervised approach uses judgment information, such as human judgment information, as part of the training data used to generate a relevance predictor model, which minimizes the inherent noisiness of the click data collected from a commercial search engine.

    摘要翻译: 为了在一组搜索结果中估计或预测项目或文档的相关性,从用户点击数据中提取相关性信息,并且从点击数据确定由用户点击的聚合表现的文档之间的关系信息 。 受监督的方法使用诸如人类判断信息之类的判断信息作为用于生成相关性预测器模型的训练数据的一部分,其使从商业搜索引擎收集的点击数据的固有噪声最小化。