-
公开(公告)号:US11687532B2
公开(公告)日:2023-06-27
申请号:US17557883
申请日:2021-12-21
Applicant: Palantir Technologies Inc.
Inventor: Nicolas Prettejohn , Katherine Ketsdever
IPC: G06F15/16 , G06F16/2453 , G06F16/22 , G06F16/2455
CPC classification number: G06F16/24544 , G06F16/2282 , G06F16/2456 , G06F16/24532
Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.
-
公开(公告)号:US12204542B2
公开(公告)日:2025-01-21
申请号:US18316723
申请日:2023-05-12
Applicant: Palantir Technologies Inc.
Inventor: Nicolas Prettejohn , Katherine Ketsdever
IPC: G06F15/16 , G06F16/22 , G06F16/2453 , G06F16/2455
Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.
-
公开(公告)号:US20220197907A1
公开(公告)日:2022-06-23
申请号:US17557883
申请日:2021-12-21
Applicant: Palantir Technologies Inc.
Inventor: Nicolas Prettejohn , Katherine Ketsdever
IPC: G06F16/2453 , G06F16/2455 , G06F16/22
Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.
-
公开(公告)号:US20230359623A1
公开(公告)日:2023-11-09
申请号:US18316723
申请日:2023-05-12
Applicant: Palantir Technologies Inc.
Inventor: Nicolas Prettejohn , Katherine Ketsdever
IPC: G06F16/2453 , G06F16/22 , G06F16/2455
CPC classification number: G06F16/24544 , G06F16/2282 , G06F16/2456 , G06F16/24532
Abstract: A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.
-
公开(公告)号:US20230333888A1
公开(公告)日:2023-10-19
申请号:US17826972
申请日:2022-05-27
Applicant: Palantir Technologies Inc.
Inventor: Adam Borochoff , John Mathews , Joseph Rafidi , James Thompson , Kamran Khan , Morten Telling , Parvathy Menon , Patrick Szmucer , Robert Kruszewski , Rahij Ramsharan , Katherine Ketsdever
CPC classification number: G06F9/4881 , G06F11/3495
Abstract: Computing systems methods, and non-transitory storage media are provided for retrieving information regarding an operation to be performed by a platform, performing a preliminary validation of the operation, generating details regarding the preliminary validation, transmitting at least a subset of the details of the preliminary validation to the platform, and populating the generated details on an interface. If the preliminary validation fails, the platform refrains from performing the operation. Furthermore, the logic describing the operation can be executed on different platforms and is not bound or limited to one platform.
-
-
-
-