发明授权
- 专利标题: System and method for joining skewed datasets in a distributed computing environment
-
申请号: US16991939申请日: 2020-08-12
-
公开(公告)号: US11615094B2公开(公告)日: 2023-03-28
- 发明人: Avnish Kumar Rastogi
- 申请人: HCL TECHNOLOGIES LIMITED
- 申请人地址: IN New Delhi
- 专利权人: HCL TECHNOLOGIES LIMITED
- 当前专利权人: HCL TECHNOLOGIES LIMITED
- 当前专利权人地址: IN New Delhi
- 主分类号: G06F16/20
- IPC分类号: G06F16/20 ; G06F16/2453 ; G06F16/2455 ; G06F16/27 ; G06F16/28 ; G06F16/21
摘要:
Disclosed is a method and system for joining datasets in a distributed computing environment. The system comprises a memory 206 and a processor 202. The processor 202 identifies a skewed dataset from two or more datasets to be joined. The processor 202 identifies a replication parameter from a configuration file. The processor 202 then assigns a randomly assigned machine number to each chunk of the skewed dataset owned by the nodes/machines involved in the join operation. The processor 202 forms copies of the non-skewed dataset equal to the replication parameter and adds the copy number to each sample of the copy of the non-skewed dataset formed. Further, the processor 202 merges each non-skewed dataset into the final copy of the non-skewed dataset, forming a single non skewed dataset. The processor 202 then repeats these steps for all the non-skewed datasets involved in the join operation resulting in generation of merged copies of all the non-skewed datasets and then performs the joining operation.
信息查询