Invention Grant
- Patent Title: Consistent randomized record-level splitting of machine learning data
-
Application No.: US14950953Application Date: 2015-11-24
-
Publication No.: US10366053B1Publication Date: 2019-07-30
- Inventor: Tianming Zheng , Nicolle M. Correa , Leo Parker Dirac , James Joseph Jesensky , Robert Matthias Steele
- Applicant: Amazon Technologies, Inc.
- Applicant Address: US WA Seattle
- Assignee: Amazon Technologies, Inc.
- Current Assignee: Amazon Technologies, Inc.
- Current Assignee Address: US WA Seattle
- Agency: Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
- Agent Robert C. Kowert
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/13 ; G06N20/00

Abstract:
A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.
Information query