-
公开(公告)号:US20220171781A1
公开(公告)日:2022-06-02
申请号:US17673049
申请日:2022-02-16
Applicant: Google LLC
Inventor: Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat
IPC: G06F16/2455 , G06F16/28 , G06F16/2458 , G06F11/14 , G06F16/18
Abstract: Systems and methods for analyzing input data records are provided in which a master process initiates a plurality of concurrent first processes each of which comprises, for each data record in at least a subset of a plurality of input data records, creating a parsed representation of the data record and independently applying a procedural language query to the parsed representation to extract one or more values. A respective emit operator is applied to at least one of the extracted one or more values thereby adding corresponding information to a respective intermediate data structure. The respective emit operator implements one of a predefined set of statistical information processing functions. The master process also initiates a plurality of second processes each of which aggregates information from a corresponding subset of intermediate data structures to produce aggregated data that is, in turn, combined to produce output data.
-
12.
公开(公告)号:US10885012B2
公开(公告)日:2021-01-05
申请号:US16417126
申请日:2019-05-20
Applicant: Google LLC
Inventor: Jeffrey Dean , Sanjay Ghemawat
IPC: G06F16/22 , G06F16/23 , G06F16/2453 , G06F9/48 , G06F9/54
Abstract: A method performs large-scale data processing in a distributed and parallel processing environment. The method defines application-independent map and reduce operations, each invoking one or more library functions that automatically handle data partitioning, parallelization of computations, and fault tolerance. A user specifies a map operation, which calls one or more of the application-independent map operators to perform data read and write operations. A user also specifies a reduce operation, which calls one or more of the application-independent reduce operators to perform data read and write operations. The method executes application-independent map worker processes. Each map worker process executes the user-specified map operation to read designated portions of input files and store intermediate data values in intermediate data structures. The method also executes application-independent reduce worker processes. Each reduce worker process executes the user-specified reduce operation to read intermediate data values from the intermediate data structures and produce final output data.
-
公开(公告)号:US20190018843A1
公开(公告)日:2019-01-17
申请号:US16116833
申请日:2018-08-29
Applicant: Google LLC
Inventor: Franz Josef Och , Jeffrey Dean , Thorsten Brants , Alexander Mark Franz , Jay Ponte , Peng Xu , Sha-Mayn Teh , Jeffrey Chin , Ignacio E. Thayer , Anton Carver , Daniel Rosart , John S. Hawkins , Karel Driesen
IPC: G06F17/28
Abstract: Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
-
公开(公告)号:US10089304B2
公开(公告)日:2018-10-02
申请号:US15480722
申请日:2017-04-06
Applicant: Google LLC
Inventor: Franz Josef Och , Jeffrey Dean , Thorsten Brants , Alexander Mark Franz , Jay Ponte , Peng Xu , Sha-Mayn Teh , Jeffrey Chin , Ignacio E. Thayer , Anton Carver , Daniel Rosart , John S. Hawkins , Karel Driesen
IPC: G06F17/28
Abstract: Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
-
公开(公告)号:US20180052890A1
公开(公告)日:2018-02-22
申请号:US15799939
申请日:2017-10-31
Applicant: GOOGLE LLC
Inventor: Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat
CPC classification number: G06F16/24561 , G06F11/1482 , G06F16/2471 , G06F16/285 , Y10S707/99933 , Y10S707/99937
Abstract: Systems and methods for analyzing input data records are provided in which a master process initiates a plurality of concurrent first processes each of which comprises, for each data record in at least a subset of a plurality of input data records, creating a parsed representation of the data record and independently applying a procedural language query to the parsed representation to extract one or more values. A respective emit operator is applied to at least one of the extracted one or more values thereby adding corresponding information to a respective intermediate data structure. The respective emit operator implements one of a predefined set of statistical information processing functions. The master process also initiates a plurality of second processes each of which aggregates information from a corresponding subset of intermediate data structures to produce aggregated data that is, in turn, combined to produce output data.
-
-
-
-