-
公开(公告)号:US11797558B2
公开(公告)日:2023-10-24
申请号:US17491985
申请日:2021-10-01
Applicant: Amazon Technologies, Inc.
Inventor: Mehul A. Shah , George Steven McPherson , Prajakta Datta Damle , Gopinath Duddi , Anurag Windlass Gupta , Benjamin Albert Sowell , Bohou Li
CPC classification number: G06F16/254 , G06F16/282
Abstract: Data transformation workflows may be generated to transform data objects. A source data schema for a data object and a target data format or target data schema for a data object may be identified. A comparison of the source data schema and the target data format or schema may be made to determine what transformations can be performed to transform the data object into the target data format or schema. Code to execute the transformation operations may then be generated. The code may be stored for subsequent modification or execution.
-
公开(公告)号:US11481408B2
公开(公告)日:2022-10-25
申请号:US15385787
申请日:2016-12-20
Applicant: Amazon Technologies, Inc.
Inventor: George Steven McPherson , Mehul A. Shah , Prajakta Datta Damle , Gopinath Duddi , Anurag Windlass Gupta
Abstract: Extract, Transform, Load (ETL) processing may be initiated by detected events. A trigger event may be associated with an ETL process apply one or more transformations to a source data object. The trigger event may be detected for the ETL process and evaluated with respect to one or more execution conditions for the ETL process. If the execution conditions for the ETL process are satisfied, then the ETL process may be executed. At least some of the source data object may be obtained, the one or more transformations of the ETL process may be applied, and one or more transformed data objects may be stored.
-
公开(公告)号:US09703594B1
公开(公告)日:2017-07-11
申请号:US14635254
申请日:2015-03-02
Applicant: Amazon Technologies, Inc.
Inventor: Ankit Kamboj , Xing Wu , George Steven McPherson , Jian Fang , Dag Stockstad , Abhishek Rajnikant Sinha
CPC classification number: G06F9/4806 , G06F9/485 , G06F9/4881 , G06F9/4887 , G06F11/30
Abstract: A system adapted to process long-running processes is disclosed. A request to upload data is received at a server. The server divides the data into multiple parts and launches a separate process to upload each of the divided parts. The server records for each process the processing time or duration that the particular process used to upload its corresponding data item. The server maintains an average processing duration that is calculated from the processing durations of the completed processes. The server identifies that one process is continuing to run and compares a processing duration for the particular process to a threshold derived from the average processing duration. If the processing duration for the particular process exceeds the threshold, the server initiates a new process to upload the same data item. When one of either the new process or the still running process has completed processing, the server terminates the other process.
-
公开(公告)号:US11704331B2
公开(公告)日:2023-07-18
申请号:US16926537
申请日:2020-07-10
Applicant: Amazon Technologies, Inc.
Inventor: Andrew Edward Caldwell , Anurag Windlass Gupta , Mehul A. Shah , Prajakta Datta Damle , George Steven McPherson
IPC: G06F16/00 , G06F16/25 , G06F16/28 , G06F16/951 , G06F16/23
CPC classification number: G06F16/254 , G06F16/2358 , G06F16/283 , G06F16/951
Abstract: Dynamic generation of data catalogs may be implemented for accessing data sets in different storage locations. Data sets may be accessed in order to extract portions of data. Structure recognition techniques may be applied to the extracted data in order to determine structural information for the data sets. The structural information may then be stored as part of a data catalog for the data sets. Requests to access the data catalog from different clients may be received and the requested structural data supplied so that the clients may access different data sets utilizing the supplied structural data. Data catalogs may be updated as changes to data sets are made.
-
公开(公告)号:US20230169086A1
公开(公告)日:2023-06-01
申请号:US18048645
申请日:2022-10-21
Applicant: Amazon Technologies, Inc.
Inventor: George Steven McPherson , Mehul A. Shah , Prajakta Datta Damle , Gopinath Duddi , Anurag Windlass Gupta
CPC classification number: G06F16/254 , G06F9/542
Abstract: Extract, Transform, Load (ETL) processing may be initiated by detected events. A trigger event may be associated with an ETL process apply one or more transformations to a source data object. The trigger event may be detected for the ETL process and evaluated with respect to one or more execution conditions for the ETL process. If the execution conditions for the ETL process are satisfied, then the ETL process may be executed. At least some of the source data object may be obtained, the one or more transformations of the ETL process may be applied, and one or more transformed data objects may be stored.
-
公开(公告)号:US11277494B1
公开(公告)日:2022-03-15
申请号:US15385784
申请日:2016-12-20
Applicant: Amazon Technologies, Inc.
Inventor: George Steven McPherson , Mehul A. Shah , Supratik Chakraborty , Prajakta Datta Damle , Gopinath Duddi , Anurag Windlass Gupta
Abstract: Code may be dynamically routed to computing resources for execution. Code may be received for execution on behalf of a client. Execution criteria for the code may be determined and computing resources that satisfy the execution criteria may be identified. The identified computing resources may then be procured for executing the code and then the code may be routed to the procured computing resources for execution. Permissions or authorization to execute the code may be shared to ensure that computing resources executing the code have the same permissions or authorization when executing the code.
-
公开(公告)号:US11138220B2
公开(公告)日:2021-10-05
申请号:US15385764
申请日:2016-12-20
Applicant: Amazon Technologies, Inc.
Inventor: Mehul A. Shah , George Steven McPherson , Prajakta Datta Damle , Gopinath Duddi , Anurag Windlass Gupta , Benjamin Albert Sowell , Bohou Li
Abstract: Data transformation workflows may be generated to transform data objects. A source data schema for a data object and a target data format or target data schema for a data object may be identified. A comparison of the source data schema and the target data format or schema may be made to determine what transformations can be performed to transform the data object into the target data format or schema. Code to execute the transformation operations may then be generated. The code may be stored for subsequent modification or execution.
-
公开(公告)号:US10963479B1
公开(公告)日:2021-03-30
申请号:US15385777
申请日:2016-12-20
Applicant: Amazon Technologies, Inc.
Inventor: Mehul A. Shah , George Steven McPherson , Supratik Chakraborty , Anurag Windlass Gupta , Benjamin Albert Sowell
Abstract: Version controlled Extract, Transform, Load (ETL) code may be hosted for developing or executing the ETL job in an ETL system. A version of ETL code may be obtained from version controlled code store and maintained in a data store. Development or execution clients may submit access requests for the version of ETL code which may be serviced from the version stored in the data store. Updates to the version of the ETL code may be eventually committed to the version controlled code store. The latest version of ETL code may also be obtained from the version controlled code store when providing the ETL code in response to a request to retrieve the ETL code.
-
公开(公告)号:US20200159742A1
公开(公告)日:2020-05-21
申请号:US16752022
申请日:2020-01-24
Applicant: Amazon Technologies, Inc.
Inventor: George Steven McPherson , Mehul A. Shah , Prajakta Datta Damle , Gopinath Duddi , Anurag Windlass Gupta
IPC: G06F16/25 , G06F16/2455 , G06F16/23
Abstract: History for data objects may be maintained to detect data events. An indication of an Extract, Transform, Load (ETL) process applied to one or more source data objects to generate one or more transformed data objects may be received. History for the source data objects may be updated to include the transformed data objects and the ETL process that generated the transformed data objects. An evaluation of the update may be performed to determine whether an event associated with the data lineage is triggered. If the event is triggered, a notification of the event may be sent to one or more subscribers for the event.
-
公开(公告)号:US10338958B1
公开(公告)日:2019-07-02
申请号:US14165521
申请日:2014-01-27
Applicant: .Amazon Technologies, Inc.
Inventor: Ankit Kamboj , Peter Sirota , George Steven McPherson , Vageesh Kumar , Sumit Kumar
Abstract: An indication of an input data stream comprising data records, stored at a stream management service, that are to be batched for a computation at a batch-oriented data processing service is received. A set of data records of the input data stream are identified, based on respective sequence numbers associated with the records, for a particular iteration of the computation. Metadata associated with the particular iteration, comprising identification information associated with the set of records on which the computation is performed during the particular iteration, is saved in a repository.
-
-
-
-
-
-
-
-
-