Performing database joins in distributed data processing systems

    公开(公告)号:US11687532B2

    公开(公告)日:2023-06-27

    申请号:US17557883

    申请日:2021-12-21

    CPC classification number: G06F16/24544 G06F16/2282 G06F16/2456 G06F16/24532

    Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.

    Performing database joins in distributed data processing systems

    公开(公告)号:US12204542B2

    公开(公告)日:2025-01-21

    申请号:US18316723

    申请日:2023-05-12

    Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.

    PERFORMING DATABASE JOINS IN DISTRIBUTED DATA PROCESSING SYSTEMS

    公开(公告)号:US20220197907A1

    公开(公告)日:2022-06-23

    申请号:US17557883

    申请日:2021-12-21

    Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.

    PERFORMING DATABASE JOINS IN DISTRIBUTED DATA PROCESSING SYSTEMS

    公开(公告)号:US20230359623A1

    公开(公告)日:2023-11-09

    申请号:US18316723

    申请日:2023-05-12

    CPC classification number: G06F16/24544 G06F16/2282 G06F16/2456 G06F16/24532

    Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.

Patent Agency Ranking