Consistent randomized record-level splitting of machine learning data
Abstract:
A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.
Information query
Patent Agency Ranking
0/0