摘要:
With a continuous source of data relating to transactions, the data may be segmented and processed in a data flow arrangement, optionally in parallel, and the data may be processed without storing the data in an intermediate database. Data from multiple sources may be processed in parallel. The segmentation also may define points at which aggregate outputs may be provided, and where checkpoints may be established.
摘要:
A system provides an environment for parallel programming by providing a plurality of modular parallelizable operators stored in a computer readable memory. Each operator defines operation programming for performing an operation, one or more communication ports, each of which is either an input port for providing the operation programming a data stream of records, or an output port for receiving a data stream of records from the operation programming and an indication for each of the operator's input ports, if any, of a partitioning method to be applied to the data stream supplied to the input port. An interface enables users to define a data flow graph by giving instructions to select a specific one of the operators for inclusion in the graph, or instructions to select a specific data object, which is capable of supplying or receiving a data stream of one or more records, for inclusion in the graph, or instructions to associate a data link with a specific communication port of an operator in the graph, which data link defines a path for the communication of a data stream of one or more records between its associated communications port and either a specific data object or the specific communication port of another specific operator in said graph. The execution of a data flow graph equivalent to that defined by the users is automatically parallelized by causing a separate instance of each such operator, including its associated operation programming, to be run on each of multiple processors, with each instance of a given operator having a corresponding input and output port for each input and output port of the given operator, and by automatically partitioning the data stream supplied to the corresponding inputs of the instances of a given operator as a function of the partitioning method indication for the given operator's corresponding input.
摘要:
Checkpointing of operations on data may be provided by partitioning the data into temporal segments. Operations may be performed on the temporal segments and checkpoints may be established by storing a persistent indication of the segment being processed. The entire processing state need not be saved. If a failure occurs, processing can be restarted using the saved indication of the segment to be processed. Such data partitioning and checkpointing may be applied to relational databases, databases with dataflow operation and/or parallelism and other database types with or without parallel operation.
摘要:
With a continuous source of data relating to transactions, the data may be segmented and processed in a data flow arrangement, optionally in parallel, and the data may be processed without storing the data in an intermediate database. Data from multiple sources may be processed in parallel. The segmentation also may define points at which aggregate outputs may be provided, and where checkpoints may be established.
摘要:
With a continuous source of data relating to transactions, the data may be segmented and processed in a data flow arrangement, optionally in parallel, and the data may be processed without storing the data in an intermediate database. Data from multiple sources may be processed in parallel. The segmentation also may define points at which aggregate outputs may be provided, and where checkpoints may be established.
摘要:
A computer system splits a data space to partition data between processors or processes. The data space may be split into sub-regions which need not be orthogonal to the axes defined the data space's parameters, using a decision tree. The decision tree can have neural networks in each of its non-terminal nodes that are trained on, and are used to partition, training data. Each terminal, or leaf, node can have a hidden layer neural network trained on the training data that reaches the terminal node. The training of the non-terminal nodes' neural networks can be performed on one processor and the training of the leaf nodes' neural networks can be run on separate processors. Different target values can be used for the training of the networks of different non-terminal nodes. The non-terminal node networks may be hidden layer neural networks. Each non-terminal node automatically may send a desired ratio of the training records it receives to each of its child nodes, so the leaf node networks each receives approximately the same number of training records. The system may automatically configures the tree to have a number of leaf nodes equal to the number of separate processors available to train leaf node networks. After the non-terminal and leaf node networks have been trained, the records of a large data base can be passed through the tree for classification or for estimation of certain parameter values.
摘要:
A performance monitor represents execution of a data flow graph by changing performance information along different parts of a representation of that graph. If the graph is executed in parallel, the monitor can show parallel operator instances, associated datalinks, and performance information relevant to each. The individual parallel processes executing the graph send performance messages to the performance monitor, and the performance monitor can instruct such processes to vary the information they send. The monitor can provides 2D or 3D views in which the user can change focus, zoom and viewpoint. In 3D views, parallel instances of the same operator are grouped in a 2D array. The data rate of a datalink can be represented by both the density and velocity of line segments along the line which represent it. The line can be colored as a function of the datalink's source or destination, its data rate, or the integral thereof. Alternatively, a histogram can be displayed along each datalink's line, displaying information about the rate of, total of, or value of a field in, the data sent, at successive intervals. The user can click on objects to obtain additional information, such as bar charts of statistics, detailed performance listings, or invocation of a debugger. The user can selectively collapse representations of graph objects into composite representations, highlight objects which are out of records or which have flow blockages; label operators; turn off the display of objects; and record and playback the performance information.
摘要:
With a continuous source of data relating to transactions, the data may be segmented and processed in a data flow arrangement, optionally in parallel, and the data may be processed without storing the data in an intermediate database. Data from multiple sources may be processed in parallel. The segmentation also may define points at which aggregate outputs may be provided, and where checkpoints may be established.
摘要:
A computer system splits a data space to partition data between processors or processes. The data space may be split into sub-regions which need not be orthogonal to the axes defined by the data space's parameters, using a decision tree. The decision tree can have neural networks in each of its non-terminal nodes that are trained on, and are used to partition, training data. Each terminal, or leaf, node can have a hidden layer neural network trained on the training data that reaches the terminal node. The training of the non-terminal nodes' neural networks can be performed on one processor and the training of the leaf nodes' neural networks can be run on separate processors. Different target values can be used for the training of the networks of different non-terminal nodes. The non-terminal node networks may be hidden layer neural networks. Each non-terminal node automatically may send a desired ratio of the training records it receives to each of its child nodes, so the leaf node networks each receives approximately the same number of training records. The system may automatically configures the tree to have a number of leaf nodes equal to the number of separate processors available to train leaf node networks. After the non-terminal and leaf node networks have been trained, the records of a large data base can be passed through the tree for classification or for estimation of certain parameter values.