-
公开(公告)号:US09760595B1
公开(公告)日:2017-09-12
申请号:US15392039
申请日:2016-12-28
Applicant: Google Inc.
Inventor: Kenneth J. Goldman , Tushar Deepak Chandra , Tal Shaked , Yonggang Zhao
CPC classification number: G06F17/30371 , G06F9/5066 , G06F9/544 , G06F17/30321 , G06F17/30554 , G06F17/30584 , G06F17/30917 , H04L67/1097
Abstract: Parallel processing of data may include a set of map processes and a set of reduce processes. Each map process may include at least one map thread. Map threads may access distinct input data blocks assigned to the map process, and may apply an application specific map operation to the input data blocks to produce key-value pairs. Each map process may include a multiblock combiner configured to apply a combining operation to values associated with common keys in the key-value pairs to produce combined values, and to output intermediate data including pairs of keys and combined values. Each reduce process may be configured to access the intermediate data output by the multiblock combiners. For each key, an application specific reduce operation may be applied to the combined values associated with the key to produce output data.
-
2.
公开(公告)号:US09390382B2
公开(公告)日:2016-07-12
申请号:US14142970
申请日:2013-12-30
Applicant: Google Inc.
Inventor: Yoram Singer , Tal Shaked , Tushar Deepak Chandra , Tze Way Eugene Ie , James Vincent McFadden , Jeremiah Harmsen , Kristen Riedt LeFevre
IPC: G06N99/00
CPC classification number: G06N99/005
Abstract: Systems and techniques are disclosed for training a machine learning model based on one or more regularization penalties associated with one or more features. A template having a lower regularization penalty may be given preference over a template having a higher regularization penalty. A regularization penalty may be determined based on domain knowledge. A restrictive regularization penalty may be assigned to a template based on determining that a template occurrence is below a stability threshold and may be modified if the template occurrence meets or exceeds the stability threshold.
Abstract translation: 公开了用于基于与一个或多个特征相关联的一个或多个正则化惩罚来训练机器学习模型的系统和技术。 具有较低正则化罚分的模板可以优先于具有较高正则化惩罚的模板。 正规化惩罚可以根据领域知识来确定。 基于确定模板出现低于稳定性阈值,可以将限制性正则化惩罚分配给模板,并且如果模板发生满足或超过稳定性阈值,则可以对模板进行修改。
-
3.
公开(公告)号:US20150186795A1
公开(公告)日:2015-07-02
申请号:US14142977
申请日:2013-12-30
Applicant: Google Inc.
Inventor: Tushar Deepak Chandra , Tal Shaked , Yoram Singer , Tze Way Eugene Ie , Joshua Redstone
IPC: G06N99/00
CPC classification number: G06N99/005
Abstract: Implementations of the disclosed subject matter provide methods and systems for using a multistage learner for efficiently boosting large datasets in a machine learning system. A method may include obtaining a first plurality of examples for a machine learning system and selecting a first point in time. Next, a second point in time occurring subsequent to the first point in time may be selected. The machine learning system may be trained using m of the first plurality of examples. Each of the m examples may include a feature initially occurring after the second point in time. In addition, the machine learning system may be trained using n of the first plurality of examples, and each of the n examples may include a feature initially occurring after the first point in time.
Abstract translation: 所公开的主题的实现提供了使用多级学习者有效地提升机器学习系统中的大型数据集的方法和系统。 方法可以包括获得机器学习系统的第一多个示例并选择第一时间点。 接下来,可以选择在第一时间点之后发生的第二时间点。 可以使用第一多个示例的m来训练机器学习系统。 m个示例中的每一个可以包括最初在第二时间点之后发生的特征。 此外,可以使用第一多个示例中的n来训练机器学习系统,并且n个示例中的每个示例可以包括最初在第一时间点之后发生的特征。
-
公开(公告)号:US09805312B1
公开(公告)日:2017-10-31
申请号:US14105262
申请日:2013-12-13
Applicant: Google Inc.
Inventor: Tal Shaked , Tushar Deepak Chandra , Yoram Singer , Tze Way Eugene Ie , Joshua Redstone
IPC: G06N99/00
CPC classification number: G06N99/005 , G06F17/30312 , H03M7/40
Abstract: Methods and systems for replacing feature values of features in training data with integer values selected based on a ranking of the feature values. The methods and systems are suitable for preprocessing large-scale machine learning training data.
-
公开(公告)号:US09536014B1
公开(公告)日:2017-01-03
申请号:US14922552
申请日:2015-10-26
Applicant: Google Inc.
Inventor: Kenneth J. Goldman , Tushar Deepak Chandra , Tal Shaked , Yonggang Zhao
CPC classification number: G06F17/30371 , G06F9/5066 , G06F9/544 , G06F17/30321 , G06F17/30554 , G06F17/30584 , G06F17/30917 , H04L67/1097
Abstract: Parallel processing of data may include a set of map processes and a set of reduce processes. Each map process may include at least one map thread. Map threads may access distinct input data blocks assigned to the map process, and may apply an application specific map operation to the input data blocks to produce key-value pairs. Each map process may include a multiblock combiner configured to apply a combining operation to values associated with common keys in the key-value pairs to produce combined values, and to output intermediate data including pairs of keys and combined values. Each reduce process may be configured to access the intermediate data output by the multiblock combiners. For each key, an application specific reduce operation may be applied to the combined values associated with the key to produce output data.
Abstract translation: 数据的并行处理可以包括一组地图处理和一组缩减过程。 每个地图过程可以包括至少一个地图线程。 映射线程可以访问分配给映射过程的不同输入数据块,并且可以将应用特定映射操作应用于输入数据块以产生键值对。 每个映射过程可以包括多块组合器,其被配置为将组合操作应用于与键值对中的公共密钥相关联的值以产生组合值,以及输出包括密钥对和组合值的中间数据。 每个减少处理可以被配置为访问由多块组合器输出的中间数据。 对于每个密钥,可以将应用特定的减少操作应用于与密钥相关联的组合值以产生输出数据。
-
6.
公开(公告)号:US20150186794A1
公开(公告)日:2015-07-02
申请号:US14142970
申请日:2013-12-30
Applicant: Google Inc.
Inventor: Yoram Singer , Tal Shaked , Tushar Deepak Chandra , Tze Way Eugene Ie , James Vincent McFadden , Jeremiah Harmsen , Kristen Riedt LeFevre
IPC: G06N99/00
CPC classification number: G06N99/005
Abstract: Systems and techniques are disclosed for training a machine learning model based on one or more regularization penalties associated with one or more features. A template having a lower regularization penalty may be given preference over a template having a higher regularization penalty. A regularization penalty may be determined based on domain knowledge. A restrictive regularization penalty may be assigned to a template based on determining that a template occurrence is below a stability threshold and may be modified if the template occurrence meets or exceeds the stability threshold.
Abstract translation: 公开了用于基于与一个或多个特征相关联的一个或多个正则化惩罚来训练机器学习模型的系统和技术。 具有较低正则化罚分的模板可以优先于具有较高正则化惩罚的模板。 正规化惩罚可以根据领域知识来确定。 基于确定模板出现低于稳定性阈值,可以将限制性正则化惩罚分配给模板,并且如果模板发生满足或超过稳定性阈值,则可以对模板进行修改。
-
7.
公开(公告)号:US09418343B2
公开(公告)日:2016-08-16
申请号:US14142977
申请日:2013-12-30
Applicant: Google Inc.
Inventor: Tushar Deepak Chandra , Tal Shaked , Yoram Singer , Tze Way Eugene Ie , Joshua Redstone
CPC classification number: G06N99/005
Abstract: Implementations of the disclosed subject matter provide methods and systems for using a multistage learner for efficiently boosting large datasets in a machine learning system. A method may include obtaining a first plurality of examples for a machine learning system and selecting a first point in time. Next, a second point in time occurring subsequent to the first point in time may be selected. The machine learning system may be trained using m of the first plurality of examples. Each of the m examples may include a feature initially occurring after the second point in time. In addition, the machine learning system may be trained using n of the first plurality of examples, and each of the n examples may include a feature initially occurring after the first point in time.
Abstract translation: 所公开的主题的实现提供了使用多级学习者有效地提升机器学习系统中的大型数据集的方法和系统。 方法可以包括获得机器学习系统的第一多个示例并选择第一时间点。 接下来,可以选择在第一时间点之后发生的第二时间点。 可以使用第一多个示例的m来训练机器学习系统。 m个示例中的每一个可以包括最初在第二时间点之后发生的特征。 此外,可以使用第一多个示例中的n来训练机器学习系统,并且n个示例中的每个示例可以包括最初在第一时间点之后发生的特征。
-
公开(公告)号:US20200151614A1
公开(公告)日:2020-05-14
申请号:US14106900
申请日:2013-12-16
Applicant: Google Inc.
Inventor: Tal Shaked , Tushar Deepak Chandra , James Vincent McFadden , Yoram Singer , Tze Way Eugene Ie
IPC: G06N99/00
Abstract: Systems and techniques are provided for template exploration in a large-scale machine learning system. A method may include obtaining multiple base templates, each base template comprising multiple features. A template performance score may be obtained for each base template and a first base template may be selected from among the multiple base templates based on the template performance score of the first base template. Multiple cross-templates may be constructed by generating a cross-template of the selected first base template and each of the multiple base templates. Performance of a machine learning model may be tested based on each cross-template to generate a cross-template performance score for each of the cross-templates. A first cross-template may be selected from among the multiple cross-templates based on the cross-template performance score of the cross-template. Accordingly, the first cross-template may be added to the machine learning model.
-
公开(公告)号:US20170300814A1
公开(公告)日:2017-10-19
申请号:US15394668
申请日:2016-12-29
Applicant: Google Inc.
Inventor: Tal Shaked , Rohan Anil , Hrishikesh Balkrishna Aradhye , Mustafa Ispir , Glen Anderson , Wei Chai , Mehmet Levent Koc , Jeremiah Harmsen , Xiaobing Liu , Gregory Sean Corrado , Tushar Deepak Chandra , Heng-Tze Cheng
CPC classification number: G06N3/08 , G06N3/0454 , G06N3/0472 , G06N3/084
Abstract: A system includes one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the computers to implement a combined machine learning model for processing an input including multiple features to generate a predicted output for the machine learning input. The combined model includes: a deep machine learning model configured to process the features to generate a deep model output; a wide machine learning model configured to process the features to generate a wide model output; and a combining layer configured to process the deep model output generated by the deep machine learning model and the wide model output generated by the wide machine learning model to generate the predicted output, in which the deep model and the wide model have been trained jointly on training data to generate the deep model output and the wide model output.
-
公开(公告)号:US09569481B1
公开(公告)日:2017-02-14
申请号:US14101611
申请日:2013-12-10
Applicant: Google Inc.
Inventor: Tushar Deepak Chandra , Tal Shaked , Yoram Singer , Tze Way Eugene Ie , Joshua Redstone
CPC classification number: G06F17/30371
Abstract: The present disclosure provides systems and techniques for efficient locking of datasets in a database when updates to a dataset may be delayed. A method may include accumulating a plurality of updates to a first set of one or more values associated with one or more features. The first set of one or more values may be stored within a first database column. Next, it may be determined that a first database column update aggregation rule is satisfied. A lock assigned to at least a portion of at least a first database column may be acquired. Accordingly, one or more values in the first set within the first database column may be updated based on the plurality of updates. In an implementation, the first set of one or more values may be associated with the first lock.
Abstract translation: 本公开提供了用于在数据库的更新可能被延迟时有效地将数据集锁定在数据库中的系统和技术。 方法可以包括将多个更新累积到与一个或多个特征相关联的一个或多个值的第一组中。 第一组一个或多个值可以存储在第一数据库列中。 接下来,可以确定满足第一数据库列更新聚合规则。 可以获取分配给至少第一数据库列的至少一部分的锁。 因此,可以基于多个更新来更新第一数据库列中的第一集合中的一个或多个值。 在实现中,第一组一个或多个值可以与第一锁相关联。
-
-
-
-
-
-
-
-
-