Techniques for data extraction
    23.
    发明授权

    公开(公告)号:US10133782B2

    公开(公告)日:2018-11-20

    申请号:US15225437

    申请日:2016-08-01

    Abstract: Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.

    Generation and graphical display of data transform provenance metadata

    公开(公告)号:US11755614B2

    公开(公告)日:2023-09-12

    申请号:US17727578

    申请日:2022-04-22

    Abstract: Techniques for propagation of deletion operations among a plurality of related datasets are described herein. In an embodiment, a data processing method comprises, using a distributed database system that is programmed to manage a plurality of different raw datasets and a plurality of derived datasets that have been derived from the raw datasets based on a plurality of derivation relationships that link the raw datasets to the derived datasets: from a first dataset that is stored in the distributed database system, determining a subset of records that are candidates for propagated deletion of specified data values; determining one or more particular raw datasets that contain the subset of records; deleting the specified data values from the particular raw datasets; based on the plurality of derivation relationships and the particular raw datasets, identifying one or more particular derived datasets that have been derived from the particular raw datasets; generating and executing a build of the one or more particular derived datasets to result in creating and storing the one or more particular derived datasets without the specified data values that were deleted from the particular raw datasets; repeating the generating and executing for all derived datasets that have derivation relationships to the particular raw datasets; wherein the method is performed using one or more processors.

    TECHNIQUES FOR DATA EXTRACTION
    27.
    发明申请

    公开(公告)号:US20200349152A1

    公开(公告)日:2020-11-05

    申请号:US16933688

    申请日:2020-07-20

    Abstract: Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.

    PLUGGABLE FAULT DETECTION TESTS FOR DATA PIPELINES

    公开(公告)号:US20200012593A1

    公开(公告)日:2020-01-09

    申请号:US16572404

    申请日:2019-09-16

    Abstract: Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in addition to one or more configuration points. The configuration points represent configurable arguments, such as variables and/or functions, referenced by the instructions which implement the tests and that can be set according to the specific operation environment of the monitored pipeline.

    Pluggable fault detection tests for data pipelines

    公开(公告)号:US10417120B2

    公开(公告)日:2019-09-17

    申请号:US15671423

    申请日:2017-08-08

    Abstract: Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in addition to one or more configuration points. The configuration points represent configurable arguments, such as variables and/or functions, referenced by the instructions which implement the tests and that can be set according to the specific operation environment of the monitored pipeline.

Patent Agency Ranking