-
公开(公告)号:US08843427B1
公开(公告)日:2014-09-23
申请号:US13195349
申请日:2011-08-01
申请人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
CPC分类号: G06N99/005
摘要: In general, a method includes receiving a training data set that includes a plurality of examples, wherein each example includes one or more features and an answer, generating a plurality of modified training data sets by applying one or more filters to the training data set, each of the plurality of modified training data sets being based on a different combination of the one or more filters, training a plurality of predictive models, each of the plurality of predictive models being trained using a different modified training data set of the plurality of modified training data sets, determining a respective accuracy for each of the plurality of predictive models, identifying a most accurate predictive model based on the determined accuracies, and specifying an association between the training data set and the combination of filters used to generate the modified training data set that was used to train the most accurate predictive model.
摘要翻译: 通常,一种方法包括接收包括多个示例的训练数据集,其中每个示例包括一个或多个特征和答案,通过向训练数据集施加一个或多个过滤器来生成多个修改的训练数据集, 所述多个修改的训练数据集中的每一个基于所述一个或多个过滤器的不同组合,训练多个预测模型,所述多个预测模型中的每一个被训练使用所述多个修改的训练数据集的不同修改训练数据集 训练数据集,确定所述多个预测模型中的每一个的相应精度,基于所确定的精度来识别最准确的预测模型,以及指定所述训练数据集与用于生成修改的训练数据的滤波器的组合之间的关联 用于训练最准确的预测模型。
-
公开(公告)号:US08606728B1
公开(公告)日:2013-12-10
申请号:US13228365
申请日:2011-09-08
申请人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
CPC分类号: G06N99/005 , G06K9/6256
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suggesting training examples. In one aspect, a method includes receiving a plurality of training examples. A plurality of different types of predictive models are trained using the received training examples, wherein each of the predictive models implements a different machine learning technique. The performance of each trained model is measured. A suggestion score is computed for each training example according to each respective trained model, including weighting each suggestion score by the measured performance of the respective trained model. The computed suggestion scores for each training example are combined to compute an overall suggestion score for each training example, and the training examples are ranked by suggestion scores.
摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于建议训练示例。 一方面,一种方法包括接收多个训练示例。 使用所接收的训练示例来训练多种不同类型的预测模型,其中每个预测模型实现不同的机器学习技术。 测量每个训练模型的性能。 根据每个相应的训练模型,针对每个训练示例计算建议得分,包括通过相应训练模型的测量性能对每个建议得分加权。 将每个训练样本的计算建议得分合并计算每个训练样本的总体建议得分,并通过建议得分对训练样本进行排名。
-
公开(公告)号:US08533222B2
公开(公告)日:2013-09-10
申请号:US13014223
申请日:2011-01-26
IPC分类号: G06F17/30
CPC分类号: G06N99/005
摘要: Methods, systems, and apparatus, including computer programs encoded on one or more computer storage devices, for training and retraining predictive models. A series of training data sets for predictive modeling can be received, e.g., over a network from a client computing system. The training data included in the training data sets is different from initial training data that was used with multiple training functions to train multiple trained predictive models stored in a predictive model repository. The series of training data sets are used with multiple trained updateable predictive models obtained from the predictive model repository and multiple training functions to generate multiple retrained predictive models. An effectiveness score is generated for each of the retrained predictive models. A first trained predictive model is selected from among the trained predictive models included in the predictive model repository and the retrained predictive models based on their respective effectiveness scores.
摘要翻译: 方法,系统和装置,包括在一个或多个计算机存储装置上编码的用于训练和重新训练预测模型的计算机程序。 可以例如通过来自客户端计算系统的网络来接收用于预测建模的一系列训练数据集。 包含在训练数据集中的训练数据不同于与多个训练功能一起使用以训练存储在预测模型存储库中的多个训练预测模型的初始训练数据。 训练数据集系列与从预测模型库获得的多种训练可更新预测模型和多种训练功能一起使用,以生成多个再训练预测模型。 为每个再培训的预测模型生成有效性分数。 从预测模型库中包含的经过训练的预测模型中选择第一训练预测模型,并且基于它们各自的有效性分数来选择再训练预测模型。
-
公开(公告)号:US08364613B1
公开(公告)日:2013-01-29
申请号:US13246596
申请日:2011-09-27
申请人: Wei-Hao Lin , Travis H. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis H. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
CPC分类号: G06N7/005
摘要: Methods include the actions of storing a first predictive model in computer-readable memory, the first predictive model having been defined based on a first training dataset provided by an owner of the first predictive model and being operable to generate an output based on a query, enabling access to the first predictive model based on permissions defined by the owner, while inhibiting access to the first training dataset, receiving a second training dataset from a user, the second training dataset being distinct from the first training dataset, modifying the first predictive model based on the second training dataset to provide a second predictive model, storing the second predictive model in computer-readable memory, and enabling access to the second predictive model.
摘要翻译: 方法包括将第一预测模型存储在计算机可读存储器中的动作,第一预测模型已经基于由第一预测模型的所有者提供的第一训练数据集定义,并且可操作以基于查询生成输出, 基于由所有者定义的权限来访问第一预测模型,同时禁止对第一训练数据集的访问,从用户接收第二训练数据集,第二训练数据集与第一训练数据集不同,修改第一预测模型 基于所述第二训练数据集来提供第二预测模型,将所述第二预测模型存储在计算机可读存储器中,以及使得能够访问所述第二预测模型。
-
公开(公告)号:US08311967B1
公开(公告)日:2012-11-13
申请号:US13246229
申请日:2011-09-27
申请人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
CPC分类号: G06N99/005
摘要: Methods, systems, and apparatus, for selecting a trained predictive models. A request is received from a client-subscriber computing system for access to a trained predictive model that can generate a predictive output in response to receiving input data having one or more input types. Information that describes each of the trained predictive models in a predictive model repository can be used to determine that one or more models included in the repository match the request. Determining a match can be based (at least in part) on a comparison of the one or more input types to input types included in the information that describes the trained predictive models. Access is provided to at least one of the models to the client-subscriber computing system. The models that match the request are models that were trained using training data provided by a computing system other than the client-subscriber computing system.
摘要翻译: 用于选择训练有素的预测模型的方法,系统和装置。 从客户端 - 用户计算系统接收到用于访问经过训练的预测模型的请求,该预测模型可以响应于接收到具有一个或多个输入类型的输入数据而产生预测输出。 描述预测模型存储库中每个经过训练的预测模型的信息可用于确定包含在存储库中的一个或多个模型与请求匹配。 可以基于(至少部分地)将一个或多个输入类型与描述训练的预测模型的信息中包括的输入类型进行比较来确定匹配。 向客户端 - 用户计算系统提供至少一个模型的访问。 与请求相匹配的模型是使用除了客户端 - 用户计算系统之外的计算系统提供的训练数据训练的模型。
-
公开(公告)号:US08595154B2
公开(公告)日:2013-11-26
申请号:US13014252
申请日:2011-01-26
IPC分类号: G06F15/18
CPC分类号: G06N99/005
摘要: Methods, systems, and apparatus, including computer programs encoded on one or more computer storage devices, for training and retraining predictive models. A series of training data sets are received and added to a training data queue. In response to a first condition being satisfied, multiple retrained predictive models are generated using the training data queue, multiple updateable trained predictive models obtained from a repository of trained predictive models, and multiple training functions. In response to a second condition being satisfied, multiple new trained predictive models are generated using the training data queue, at least some training data stored in a training data repository and training functions. The new trained predictive models include static trained predictive models and updateable trained predictive models. The repository of trained predictive models is updated with at least some of the retrained predictive models and new trained predictive models.
摘要翻译: 方法,系统和装置,包括在一个或多个计算机存储装置上编码的用于训练和重新训练预测模型的计算机程序。 一系列训练数据集被接收并添加到训练数据队列中。 响应于满足第一条件,使用训练数据队列,从已训练的预测模型的存储库获得的多个可更新训练的预测模型和多个训练功能来生成多个再训练的预测模型。 响应于满足第二条件,使用训练数据队列,存储在训练数据存储库中的至少一些训练数据和训练功能来生成多个新训练的预测模型。 新训练的预测模型包括静态训练预测模型和可更新训练预测模型。 训练有素的预测模型的存储库使用至少一些再培训的预测模型和新的训练预测模型进行更新。
-
公开(公告)号:US08533224B2
公开(公告)日:2013-09-10
申请号:US13101048
申请日:2011-05-04
申请人: Wei-Hao Lin , Travis Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis Green , Robert Kaplow , Gang Fu , Gideon S. Mann
IPC分类号: G06F17/30
CPC分类号: G06N3/08 , G06N99/005
摘要: A system includes a computer(s) coupled to a data storage device(s) that stores a training data repository and a predictive model repository. The training data repository includes retained data samples from initial training data and from previously received data sets. The predictive model repository includes at least one updateable trained predictive model that was trained with the initial training data and retrained with the previously received data sets. A new data set is received. A richness score is assigned to each of the data samples in the set and to the retained data samples that indicates how information rich a data sample is for determining accuracy of the trained predictive model. A set of test data is selected based on ranking by richness score the retained data samples and the new data set. The trained predictive model is accuracy tested using the test data and an accuracy score determined.
摘要翻译: 系统包括耦合到存储训练数据存储库和预测模型存储库的数据存储设备的计算机。 训练数据库包括来自初始训练数据和先前接收的数据集的保留数据样本。 预测模型储存库包括至少一个可更新训练的预测模型,该预测模型用初始训练数据训练并用先前接收到的数据集重新训练。 接收到一个新的数据集。 丰富度得分被分配给集合中的每个数据样本和保留的数据样本,其指示如何丰富数据样本的信息用于确定训练的预测模型的准确性。 基于通过丰富度得分的保留数据样本和新数据集的等级来选择一组测试数据。 经过训练的预测模型使用测试数据进行精确测试,并确定精度得分。
-
公开(公告)号:US08521664B1
公开(公告)日:2013-08-27
申请号:US13172714
申请日:2011-06-29
申请人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
IPC分类号: G06F15/18
CPC分类号: G06N99/005
摘要: Methods, systems, and apparatus, for selecting a trained predictive models. A request is received from a client-subscriber computing system for access to a trained predictive model that can generate a predictive output in response to receiving input data having one or more input types. Information that describes each of the trained predictive models in a predictive model repository can be used to determine that one or more models included in the repository match the request. Determining a match can be based (at least in part) on a comparison of the one or more input types to input types included in the information that describes the trained predictive models. Access is provided to at least one of the models to the client-subscriber computing system. The models that match the request are models that were trained using training data provided by a computing system other than the client-subscriber computing system.
摘要翻译: 用于选择训练有素的预测模型的方法,系统和装置。 从客户端 - 用户计算系统接收到用于访问经过训练的预测模型的请求,该预测模型可以响应于接收到具有一个或多个输入类型的输入数据而产生预测输出。 描述预测模型存储库中每个经过训练的预测模型的信息可用于确定包含在存储库中的一个或多个模型与请求匹配。 可以基于(至少部分地)将一个或多个输入类型与描述训练的预测模型的信息中包括的输入类型进行比较来确定匹配。 向客户端 - 用户计算系统提供至少一个模型的访问。 与请求相匹配的模型是使用由客户端 - 用户计算系统以外的计算系统提供的训练数据训练的模型。
-
公开(公告)号:US08443013B1
公开(公告)日:2013-05-14
申请号:US13246541
申请日:2011-09-27
申请人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis H. K. Green , Robert Kaplow , Gang Fu , Gideon S. Mann
CPC分类号: G06Q10/04 , G06F17/30477 , G06N99/005
摘要: A computer-implemented method includes obtaining a database table, the table including multiple rows and multiple columns, in which one or more rows are missing at least one column value, executing a script, using a script engine, in response to obtaining the table, in which executing the script causes one or more values from the rows to be provided as input data to a first predictive model, and processing, using the first predictive model, the input data to obtain output data, the output data including a predicted value for at least one of the missing column values, and populating one or more of the missing column values with the output data to provide a revised database table.
摘要翻译: 计算机实现的方法包括:获得数据库表,所述表包括多行和多列,其中一行或多行缺少至少一个列值,响应于获得该表而使用脚本引擎执行脚本, 其中执行所述脚本使得所述行中的一个或多个值被提供为第一预测模型的输入数据,并且使用所述第一预测模型处理所述输入数据以获得输出数据,所述输出数据包括 至少一个缺少的列值,并使用输出数据填充一个或多个缺失的列值,以提供修订的数据库表。
-
公开(公告)号:US20120284213A1
公开(公告)日:2012-11-08
申请号:US13101048
申请日:2011-05-04
申请人: Wei-Hao Lin , Travis Green , Robert Kaplow , Gang Fu , Gideon S. Mann
发明人: Wei-Hao Lin , Travis Green , Robert Kaplow , Gang Fu , Gideon S. Mann
CPC分类号: G06N3/08 , G06N99/005
摘要: A system includes a computer(s) coupled to a data storage device(s) that stores a training data repository and a predictive model repository. The training data repository includes retained data samples from initial training data and from previously received data sets. The predictive model repository includes at least one updateable trained predictive model that was trained with the initial training data and retrained with the previously received data sets. A new data set is received. A richness score is assigned to each of the data samples in the set and to the retained data samples that indicates how information rich a data sample is for determining accuracy of the trained predictive model. A set of test data is selected based on ranking by richness score the retained data samples and the new data set. The trained predictive model is accuracy tested using the test data and an accuracy score determined.
摘要翻译: 系统包括耦合到存储训练数据存储库和预测模型存储库的数据存储设备的计算机。 训练数据库包括来自初始训练数据和先前接收的数据集的保留数据样本。 预测模型储存库包括至少一个可更新训练的预测模型,该预测模型用初始训练数据训练并用先前接收到的数据集重新训练。 接收到一个新的数据集。 丰富度得分被分配给集合中的每个数据样本和保留的数据样本,其指示如何丰富数据样本的信息用于确定训练的预测模型的准确性。 基于通过丰富度得分的保留数据样本和新数据集的等级来选择一组测试数据。 经过训练的预测模型使用测试数据进行精确测试,并确定精度得分。
-
-
-
-
-
-
-
-
-