STRATIFIED SAMPLING USING ADAPTIVE PARALLEL DATA PROCESSING
    2.
    发明申请
    STRATIFIED SAMPLING USING ADAPTIVE PARALLEL DATA PROCESSING 有权
    使用自适应并行数据处理进行分层采样

    公开(公告)号:US20150186493A1

    公开(公告)日:2015-07-02

    申请号:US14141635

    申请日:2013-12-27

    Abstract: Stratified sampling of a plurality of records is performed. A plurality of records are partitioned into a plurality of splits, wherein each split includes at least a portion of the plurality of records. The split of the plurality of splits is provided to a mapper. The mapper assigns at least a portion the records of the at least one split to a group based on a strata of the assigned records, and filters the records of the group based on a comparison of the weights of the records to a local threshold of the mapper. The mapper updates the local threshold of the mapper by communicating with a coordinator. The mapper shuffles the group to a reducer, where the reducer filters the records of the group based on the weights of the records. The reducer provides a stratified sampling of the plurality of records based on the group.

    Abstract translation: 执行多个记录的分层抽样。 多个记录被分割成多个分割,其中每个分割包括多个记录的至少一部分。 将多个分割的分割提供给映射器。 映射器基于分配的记录的层将至少一个分裂的记录的至少一部分分配给组,并且基于记录的权重与记录的本地阈值的比较来过滤组的记录 映射器 映射器通过与协调器通信来更新映射器的本地阈值。 映射器将组混洗到reducer,其中reducer根据记录的权重过滤组的记录。 减速器基于该组提供多个记录的分层采样。

    STRATIFIED SAMPLING USING ADAPTIVE PARALLEL DATA PROCESSING
    4.
    发明申请
    STRATIFIED SAMPLING USING ADAPTIVE PARALLEL DATA PROCESSING 有权
    使用自适应并行数据处理进行分层采样

    公开(公告)号:US20160321350A1

    公开(公告)日:2016-11-03

    申请号:US15208677

    申请日:2016-07-13

    Abstract: A computer-implemented method includes partitioning a plurality of records into a plurality of splits. Each split includes at least a portion of the plurality of records. The method further includes providing at least one split of the plurality of splits to a mapper. The mapper scans the input data set, transforms each input record using a map function, and extracts a grouping key in parallel. The method further includes assigning at least a portion the records of the at least one split to a group. Each assignment to the group is based on a strata of the assigned record, and filtering the records of the group. Each filtering is based on a comparison of a weight of a record to a local threshold of the mapper. The method further includes shuffling the group to a reducer and providing a stratified sampling of the plurality of records based on the group.

    Abstract translation: 计算机实现的方法包括将多个记录划分成多个分割。 每个分割包括多个记录的至少一部分。 该方法还包括将多个分割的至少一个分割提供给绘图器。 映射器扫描输入数据集,使用映射函数转换每个输入记录,并并行提取分组键。 该方法还包括将至少一个拆分的记录的至少一部分分配给一个组。 对组的每个分配基于分配的记录的层次,并且过滤组的记录。 每个过滤是基于记录的权重与映射器的本地阈值的比较。 该方法还包括将组洗牌到减速器,并且基于该组提供多个记录的分层采样。

Patent Agency Ranking