Performing database joins in distributed data processing systems

Invention Grant

US11687532B2 Performing database joins in distributed data processing systems 有权

Please log in to see more content

Patent Title: Performing database joins in distributed data processing systems
Application No.: US17557883

Application Date: 2021-12-21
Publication No.: US11687532B2

Publication Date: 2023-06-27
Inventor: Nicolas Prettejohn , Katherine Ketsdever
Applicant: Palantir Technologies Inc.
Applicant Address: US CO Denver
Assignee: Palantir Technologies Inc.
Current Assignee: Palantir Technologies Inc.
Current Assignee Address: US CO Denver
Agency: Knobbe, Martens, Olson & Bear, LLP
Main IPC: G06F15/16
IPC: G06F15/16 ; G06F16/2453 ; G06F16/22 ; G06F16/2455

Performing database joins in distributed data processing systems

Abstract:

A computer-implemented method for efficiently performing a database join in a distributed data processing system comprising multiple computational nodes, the method comprising determining a first set of one or more columns of a first database table and a second set of one or more columns of a second database table on which the join is to be performed; estimating a size of the rows of the first table which have a particular combination of values in the first set of columns; computing a salt factor n based on the estimated size of rows and further based on a processing capacity of a computational node of the distributed data processing system; assigning one of n different salt values to each row of the first table having the particular combination of values in the first set of columns; for each row of the second table having the particular combination of values in the second set of columns into n rows, expanding the row into n row, and assigning to each expanded row a different one of the n salt values; and performing a join operation on the modified first and second tables, wherein the rows of the first and second tables have the same combination of values in the first and second sets of columns and the same salt value are joined on the same computational node.

Public/Granted literature

US20220197907A1 PERFORMING DATABASE JOINS IN DISTRIBUTED DATA PROCESSING SYSTEMS Public/Granted day:2022-06-23

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F15/00	通用数字计算机（零部件入G06F1/00至G06F13/00组）；通用数据处理设备
G06F15/16	.两个或多个数字计算机的组合，每台计算机至少具有一个运算单元、一个程序单元和一个寄存器，例如，用于数个程序的同时处理