Abstract:
A method, a computer program product and a system identify partition locations within an extended markup language (XML) document without parsing so as to process portions of said document in parallel. The XML document includes sections required to remain continuous. The document is scanned for continuous sections without parsing, and boundaries of the initial partitions are adjusted to reside outside the continuous sections to determine resulting partitions for the document. The resulting partitions may be processed in parallel to provide the document information for storage.
Abstract:
The present invention relates to a method, computer program product and system for de-identifying data, wherein a de-identification protocol is selectively mapped to a business rule at runtime via an ETL tool.
Abstract:
A method, computer program product, and system for enabling parallel processing of an XML document without pre-parsing, utilizing metadata associated with the XML document and created at the same time as the XML document. The metadata is used to generate partitions of the XML document at the time of parallel processing, without requiring system-intensive pre-parsing.
Abstract:
The present invention relates to a method, computer program product and system for masking sensitive data and, more particularly, to dynamically de-identifying sensitive data from a data source for a target application, including enabling a user to selectively alter an initial de-identification protocol for the sensitive data elements via an interface.
Abstract:
The present invention relates to a method, computer program product and system for de-identifying data, wherein a de-identification protocol is selectively mapped to a business rule at runtime via an ETL tool.
Abstract:
A method, a computer program product and a system identify partition locations within an extended markup language (XML) document without parsing so as to process portions of said document in parallel. The XML document includes sections required to remain continuous. The document is scanned for continuous sections without parsing, and boundaries of the initial partitions are adjusted to reside outside the continuous sections to determine resulting partitions for the document. The resulting partitions may be processed in parallel to provide the document information for storage.
Abstract:
A method, computer program product, and system for enabling parallel processing of an XML document without pre-parsing, utilizing metadata associated with the XML document and created at the same time as the XML document. The metadata is used to generate partitions of the XML document at the time of parallel processing, without requiring system-intensive pre-parsing.
Abstract:
Methods and arrangements for extracting tuples from a streaming XML document. A query twig is applied to the XML document stream, tuples are extracted from the XML document stream based on the query twig, and a quantity of extracted tuples is limited via foregoing extraction of duplicate tuples extraction of tuples that do not satisfy query twig criteria.
Abstract:
An information retrieval system and method are provided for minimizing the number of blocks searched in a cell before recording a new record in the table and determining which block can be assigned if a table has space available to store a new record in the case an additional block should be associated with a cell. Dimensions for a table are identified, and at least one block in the table is associated with a dimension value for each dimension, where each block comprises contiguous storage pages. The block can be further associated with a cell; this associated cell has a unique combination of dimension values comprising an dimension value for each of the dimensions. A unique associated bit list for each dimension value for each dimension has a unique corresponding list entry for each block associated with that dimension value, and a unique associated bit list for each cell has a unique corresponding list entry for each block associated with that cell.
Abstract:
A computer implemented method, apparatus, and computer usable program code for generating an execution plan graph from a data flow. A metadata representation of the data flow is generated in response to receiving the data flow. A set of code units is generated from the metadata representation. Each code unit in the set of code units is executable on multiple different types of runtime engines. The set of code units is processed to produce the execution plan graph.