Abstract:
Methods and apparatus are provided for processing updates to an XML document. Updates are converted into one or more complement queries that can be performed on the XML document. The complement queries provided by the present invention allow (i) virtual views of XML data to be updated; (ii) updates and queries to be composed; and (iii) the XML document to be updated using an XML query engine. The XML document can be recursively processed to determine for each node whether the node is affected by the update and implementing the update at the affected nodes.
Abstract:
Methods and apparatus are provided for identifying constraint violation repairs in data that is comprised of a plurality of records, where each record has a plurality of cells. A database is processed, based on a plurality of constraints that data in the database must satisfy. At least one constraint violation to be resolved is identified based on a cost of repair and the corresponding records to be resolved and equivalent cells are identified in the data that violate the identified at least one constraint violation. A value for each of the equivalent cells can optionally be determined, and the determined value can be assigned to each of the equivalent cells. The at least one constraint violation selected for resolution may be, for example, the constraint violation with a lowest cost. The cost of repairing a constraint is based on a distance metric between the attributes values.
Abstract:
Method for providing controlled access to an XML document includes defining at least one access control policy for a user of the XML document, deriving a security view of the XML document for the user based upon said access control policy and schema level processing of the XML document and translating a user query based on the security view of the XML document to an equivalent query based on the XML document. An apparatus for same includes means for defining an access control policy for a user of the XML document and means for deriving a security view of the XML document for the user based on said access control policy and schema level processing of the XML document. Also included are means for translating a user query based on the security view of the XML document to an equivalent query based on the XML document.
Abstract:
A framework is provided for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, designated Attribute Integration Grammar (AIG), that extends a DTD by (1) associating element types with semantic attributes, (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The AIG uniquely operates on semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, and, as well as checks XML constraints in parallel with document-generation.
Abstract:
Methods and apparatus are provided for propagating functional dependencies with conditions. Propagation covers are computed using an SPC view of a dataset, wherein the SPC view comprises selection, projection and Cartesian product operations. Selection operations are processed to extract equivalence classes. Cartesian product operations are processed to obtain a renamed set of the plurality of conditional functional dependencies, that have attributes appearing in the SPC view. Domain constraints from the equivalence classes are applied to the renamed set to remove attributes not in the SPC view. Projection operations are processed using a reduction by resolution procedure to identify inferences that can be propagated to the SPC view from the conditional functional dependencies having attributes that do not appear in the SPC view. Domain constraints of the equivalence classes are converted to conditional functional dependencies; and a minimal cover of the SPC view is determined.
Abstract:
Methods and apparatus are provided for discovering minimal conditional functional dependencies (CFDs). CFDs extend functional dependencies by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. A disclosed CFDMiner algorithm, based on techniques for mining closed itemsets, discovers constant minimal CFDs. A disclosed CTANE algorithm discovers general minimal CFDs based on the levelwise approach. A disclosed FastCFD algorithm discovers general minimal CFDs based on a depth-first search strategy, and an optimization technique via closed-itemset mining to reduce search space.
Abstract:
Methods and apparatus are provided for detecting data inconsistencies. Methods are disclosed for determining whether a set of conditional functional dependencies are consistent; determining a minimal cover of a set of conditional functional dependencies and detecting a violation of one or more conditional functional dependencies in a set of conditional functional dependencies. The conditional functional dependencies comprise one or more constraints that data in a database must satisfy including at least one pattern with data values.
Abstract:
The invention provides a system and method for translating XPATH queries into SQL queries with a simple least fixpoint (LFP) operator, which is already supported by most commercial RDBMS. The method comprises the steps of (a) rewriting an input query into a regular query, which is capable of capturing both DTD recursion and XPATH queries in a uniform framework; and (b) translating the regular query to an SQL query with LFP. The invention further provides optimization techniques for reducing the use of the LFP operator. As a result, the invention is capable of answering a large class of XPATH queries by means of only low-end RDBMS features already available in most RDBMS.
Abstract:
Methods and apparatus are provided for evaluating XPath filters on fragmented and distributed XML documents. According to one aspect of the invention, a method is disclosed for evaluating a query over a tree having a plurality of fragments distributed over a plurality of sites. The method comprises the steps of identifying the plurality of sites storing at least one of the plurality of fragments of the tree; providing the query to the plurality of identified sites, wherein each of the identified sites partially evaluates the query against one of more fragments of the tree stored by the respective site; obtaining partial results from the plurality of identified sites; and composing the partial results to compute a result to the query. The query may be, for example, a boolean XPath query. The method can be performed, for example, by a coordinating site that stores a root fragment of the tree.
Abstract:
Methods and apparatus are provided for identifying constraint violation repairs in data that is comprised of a plurality of records, where each record has a plurality of cells. A database is processed, based on a plurality of constraints that data in the database must satisfy. At least one constraint violation to be resolved is identified based on a cost of repair and the corresponding records to be resolved and equivalent cells are identified in the data that violate the identified at least one constraint violation. A value for each of the equivalent cells can optionally be determined, and the determined value can be assigned to each of the equivalent cells. The at least one constraint violation selected for resolution may be, for example, the constraint violation with a lowest cost. The cost of repairing a constraint is based on a distance metric between the attributes values.