-
公开(公告)号:US20190114289A1
公开(公告)日:2019-04-18
申请号:US16208435
申请日:2018-12-03
Applicant: Palantir Technologies, Inc.
Inventor: Hao Dang , Gustav Brodman , Yi Xue , Stacey Milspaw , Yifei Huang , Yanran Lu
IPC: G06F16/182 , G06F9/455
Abstract: Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises, in association with a distributed database system that implements one or more data transformation pipelines, each of the data transformation pipelines comprising at least a first dataset, a first transformation, a second derived dataset and dataset dependency and timing metadata, detecting an arrival of a new raw dataset or new derived dataset; in response to the detecting, obtaining from the dataset dependency and timing metadata a dataset subset comprising those datasets that depend on at least the new raw dataset or new derived dataset; for each member dataset in the dataset subset, determining if the member dataset has a dependency on any other dataset that is not yet arrived, and in response to determining that the member dataset does not have a dependency on any other dataset that is not yet arrived: initiating a build of a portion of the data transformation pipeline comprising the member dataset and all other datasets on which the member dataset is dependent, without waiting for arrival of other datasets.
-
公开(公告)号:US11314698B2
公开(公告)日:2022-04-26
申请号:US16208435
申请日:2018-12-03
Applicant: Palantir Technologies, Inc.
Inventor: Hao Dang , Gustav Brodman , Yi Xue , Stacey Milspaw , Yifei Huang , Yanran Lu
IPC: G06F16/182 , G06F16/2455 , G06F16/25 , G06F16/23 , G06F9/455
Abstract: Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises, in association with a distributed database system that implements one or more data transformation pipelines, each of the data transformation pipelines comprising at least a first dataset, a first transformation, a second derived dataset and dataset dependency and timing metadata, detecting an arrival of a new raw dataset or new derived dataset; in response to the detecting, obtaining from the dataset dependency and timing metadata a dataset subset comprising those datasets that depend on at least the new raw dataset or new derived dataset; for each member dataset in the dataset subset, determining if the member dataset has a dependency on any other dataset that is not yet arrived, and in response to determining that the member dataset does not have a dependency on any other dataset that is not yet arrived: initiating a build of a portion of the data transformation pipeline comprising the member dataset and all other datasets on which the member dataset is dependent, without waiting for arrival of other datasets.
-
公开(公告)号:USD928807S1
公开(公告)日:2021-08-24
申请号:US29735942
申请日:2020-05-26
Applicant: Palantir Technologies Inc.
Designer: Adhish Ramkumar , Jingwei Luo , Kushal Nigam , Jiawei Marvin Sum , Yanran Lu , Yi Xue
-
公开(公告)号:US10176217B1
公开(公告)日:2019-01-08
申请号:US15698574
申请日:2017-09-07
Applicant: Palantir Technologies, Inc.
Inventor: Hao Dang , Gustav Brodman , Yi Xue , Stacey Milspaw , Yifei Huang , Yanran Lu
Abstract: Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises, in association with a distributed database system that implements one or more data transformation pipelines, each of the data transformation pipelines comprising at least a first dataset, a first transformation, a second derived dataset and dataset dependency and timing metadata, detecting an arrival of a new raw dataset or new derived dataset; in response to the detecting, obtaining from the dataset dependency and timing metadata a dataset subset comprising those datasets that depend on at least the new raw dataset or new derived dataset; for each member dataset in the dataset subset, determining if the member dataset has a dependency on any other dataset that is not yet arrived, and in response to determining that the member dataset does not have a dependency on any other dataset that is not yet arrived: initiating a build of a portion of the data transformation pipeline comprising the member dataset and all other datasets on which the member dataset is dependent, without waiting for arrival of other datasets.
-
公开(公告)号:USD885413S1
公开(公告)日:2020-05-26
申请号:US29642989
申请日:2018-04-03
Applicant: Palantir Technologies Inc.
Designer: Adhish Ramkumar , Jingwei Luo , Kushal Nigam , Jiawei Marvin Sum , Yanran Lu , Yi Xue
-
-
-
-