Consistent randomized record-level splitting of machine learning data

Invention Grant

US10366053B1 Consistent randomized record-level splitting of machine learning data 有权

Please log in to see more content

Patent Title: Consistent randomized record-level splitting of machine learning data
Application No.: US14950953

Application Date: 2015-11-24
Publication No.: US10366053B1

Publication Date: 2019-07-30
Inventor: Tianming Zheng , Nicolle M. Correa , Leo Parker Dirac , James Joseph Jesensky , Robert Matthias Steele
Applicant: Amazon Technologies, Inc.
Applicant Address: US WA Seattle
Assignee: Amazon Technologies, Inc.
Current Assignee: Amazon Technologies, Inc.
Current Assignee Address: US WA Seattle
Agency: Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
Agent Robert C. Kowert
Main IPC: G06F16/00
IPC: G06F16/00 ; G06F16/13 ; G06N20/00

Consistent randomized record-level splitting of machine learning data

Abstract:

A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构