-
公开(公告)号:US10318398B2
公开(公告)日:2019-06-11
申请号:US15498258
申请日:2017-04-26
Applicant: Palantir Technologies Inc.
Inventor: Jesse Rickard , Peter Maag , Jared Newman , Giulio Mecocci , Harish Subbanarasimhia , Adrian Marius Dumitran , Andrzej Skrodzki , Jonah Scheinerman , Gregory Slonim , Alexandru Viorel Antihi
Abstract: A method and system for data pipeline monitoring receives an event data object and a current status data object from one or more subsystems of a pipeline. The system analyzes the event data object and the current status data object to determine a first and second validation value. The system, in response to determining that either the first or second validation value is not valid, sends a notification.
-
公开(公告)号:US10261763B2
公开(公告)日:2019-04-16
申请号:US15839680
申请日:2017-12-12
Applicant: Palantir Technologies Inc.
Inventor: Robert Fink , Matthew Cheah , Mingyu Kim , Lynn Cuthriell , Divyanshu Arora , Justin Uang , Jared Newman , Jakob Juelich , Kevin Chen , Mark Elliot , Michael Nazario
Abstract: Data transformation in a distributed system of applications and data repositories is described. The subsystems for the overall framework are distributed, thereby allowing for customization to require only isolated changes to one or more subsystems. In one embodiment, a source code repository is used to receive and store source code. A build subsystem can retrieve source code from the source code repository and build it, using one or more criteria. By building the source code, the build subsystem can generate an artifact, which is executable code, such as a JAR or SQL file. Likewise, by building the source code, the build subsystem can generate one or more job specifications for executing the executable code. In one embodiment, the artifact and job specification may be used to launch an application server in a cluster. The application server can then receive data transformation instructions and execute the data transformation instructions.
-
公开(公告)号:US10133782B2
公开(公告)日:2018-11-20
申请号:US15225437
申请日:2016-08-01
Applicant: Palantir Technologies Inc.
Inventor: Huw Pryce , James Neale , Robert Fink , Jared Newman , Graham Dennis , Viktor Nordling , Artur Jonkisz , Daniel Fox , Felix de Souza , Harkirat Singh , Mark Elliot
IPC: G06F17/30
Abstract: Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.
-
公开(公告)号:US12093279B2
公开(公告)日:2024-09-17
申请号:US18465089
申请日:2023-09-11
Applicant: Palantir Technologies Inc.
Inventor: Matthew Maclean , Adam Borochoff , Jared Newman , Joseph Rafidi
CPC classification number: G06F16/26 , G06F16/212 , G06F16/221 , G06F16/2282 , G06F16/258 , G06F16/27
Abstract: A method comprises creating metadata identifying columns of tables and column operations of one or more data transforms of the columns in a data pipeline and including links to code segments in human-readable form corresponding to the one or more data transforms; executing a build job that effects the one or more data transforms on one or more datasets to generate one or more derived datasets; causing, after the executing, a presentation of a graphical user interface (GUI) including a graphical representation of the one or more data transforms based on the metadata, wherein the method is performed by one or more processors.
-
公开(公告)号:US11755614B2
公开(公告)日:2023-09-12
申请号:US17727578
申请日:2022-04-22
Applicant: Palantir Technologies Inc.
Inventor: Matthew Maclean , Adam Borochoff , Jared Newman , Joseph Rafidi
CPC classification number: G06F16/26 , G06F16/212 , G06F16/221 , G06F16/2282 , G06F16/258 , G06F16/27
Abstract: Techniques for propagation of deletion operations among a plurality of related datasets are described herein. In an embodiment, a data processing method comprises, using a distributed database system that is programmed to manage a plurality of different raw datasets and a plurality of derived datasets that have been derived from the raw datasets based on a plurality of derivation relationships that link the raw datasets to the derived datasets: from a first dataset that is stored in the distributed database system, determining a subset of records that are candidates for propagated deletion of specified data values; determining one or more particular raw datasets that contain the subset of records; deleting the specified data values from the particular raw datasets; based on the plurality of derivation relationships and the particular raw datasets, identifying one or more particular derived datasets that have been derived from the particular raw datasets; generating and executing a build of the one or more particular derived datasets to result in creating and storing the one or more particular derived datasets without the specified data values that were deleted from the particular raw datasets; repeating the generating and executing for all derived datasets that have derivation relationships to the particular raw datasets; wherein the method is performed using one or more processors.
-
公开(公告)号:US11573776B1
公开(公告)日:2023-02-07
申请号:US17091912
申请日:2020-11-06
Applicant: Palantir Technologies Inc.
Inventor: Robert Fink , Matthew Cheah , Mingyu Kim , Lynn Cuthriell , Divyanshu Arora , Justin Uang , Jared Newman , Jakob Juelich , Kevin Chen , Mark Elliot , Michael Nazario
Abstract: Data transformation in a distributed system of applications and data repositories is described. The subsystems for the overall framework are distributed, thereby allowing for customization to require only isolated changes to one or more subsystems. In one embodiment, a source code repository is used to receive and store source code. A build subsystem can retrieve source code from the source code repository and build it, using one or more criteria. By building the source code, the build subsystem can generate an artifact, which is executable code, such as a JAR or SQL file. Likewise, by building the source code, the build subsystem can generate one or more job specifications for executing the executable code. In one embodiment, the artifact and job specification may be used to launch an application server in a cluster. The application server can then receive data transformation instructions and execute the data transformation instructions.
-
公开(公告)号:US20200349152A1
公开(公告)日:2020-11-05
申请号:US16933688
申请日:2020-07-20
Applicant: Palantir Technologies Inc.
Inventor: HUW PRYCE , James Neale , Robert Fink , Jared Newman , Graham Dennis , Viktor Nordling , Artur Jonkisz , Daniel Fox , Felix de Souza , Harkirat Singh , Mark Elliot
IPC: G06F16/2455 , G06F16/25 , G06F16/2458
Abstract: Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.
-
公开(公告)号:US10776360B2
公开(公告)日:2020-09-15
申请号:US16147687
申请日:2018-09-29
Applicant: Palantir Technologies Inc.
Inventor: Huw Pryce , James Neale , Robert Fink , Jared Newman , Graham Dennis , Viktor Nordling , Artur Jonkisz , Daniel Fox , Felix de Souza , Harkirat Singh , Mark Elliot
IPC: G06F16/00 , G06F16/2455 , G06F16/25 , G06F16/2458
Abstract: Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification has a source repository identifier that identifies a source repository including a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction includes a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.
-
公开(公告)号:US20200012593A1
公开(公告)日:2020-01-09
申请号:US16572404
申请日:2019-09-16
Applicant: Palantir Technologies, Inc.
Inventor: Peter Maag , Jacob Albertson , Jared Newman , Matthew Lynch , Maciej Albin , Viktor Nordling
Abstract: Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in addition to one or more configuration points. The configuration points represent configurable arguments, such as variables and/or functions, referenced by the instructions which implement the tests and that can be set according to the specific operation environment of the monitored pipeline.
-
公开(公告)号:US10417120B2
公开(公告)日:2019-09-17
申请号:US15671423
申请日:2017-08-08
Applicant: Palantir Technologies, Inc.
Inventor: Peter Maag , Jacob Albertson , Jared Newman , Matthew Lynch , Maciej Albin , Viktor Nordling
Abstract: Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in addition to one or more configuration points. The configuration points represent configurable arguments, such as variables and/or functions, referenced by the instructions which implement the tests and that can be set according to the specific operation environment of the monitored pipeline.
-
-
-
-
-
-
-
-
-