-
公开(公告)号:US10540606B2
公开(公告)日:2020-01-21
申请号:US14460314
申请日:2014-08-14
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US20230126005A1
公开(公告)日:2023-04-27
申请号:US18146075
申请日:2022-12-23
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US20200034742A1
公开(公告)日:2020-01-30
申请号:US16591521
申请日:2019-10-02
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US10713589B1
公开(公告)日:2020-07-14
申请号:US15060439
申请日:2016-03-03
Applicant: Amazon Technologies, Inc.
Inventor: Saman Zarandioon , Nicolle M. Correa , Leo Parker Dirac , Aleksandr Mikhaylovich Ingerman , Steven Andrew Loeppky , Robert Matthias Steele , Tianming Zheng
IPC: G06N20/00
Abstract: A determination that a machine learning data set is to be shuffled is made. Tokens corresponding to the individual observation records are generated based on respective identifiers of the records' storage objects and record key values. Respective representative values are derived from the tokens. The observation records are rearranged based on a result of sorting the representative values and provided to a shuffle result destination.
-
公开(公告)号:US10366053B1
公开(公告)日:2019-07-30
申请号:US14950953
申请日:2015-11-24
Applicant: Amazon Technologies, Inc.
Inventor: Tianming Zheng , Nicolle M. Correa , Leo Parker Dirac , James Joseph Jesensky , Robert Matthias Steele
Abstract: A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.
-
公开(公告)号:US11544623B2
公开(公告)日:2023-01-03
申请号:US16591521
申请日:2019-10-02
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US11100420B2
公开(公告)日:2021-08-24
申请号:US14460312
申请日:2014-08-14
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Rakesh Ramakrishnan , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.
-
-
-
-
-
-