Abstract:
During application of data quality rules to a data set obtained from a data source, data is retrieved from the data source along with a common set of rules configured to format the retrieved data in a manner in accordance with one or more predefined data quality rules of the common set of rules. At least one predefined data quality rule is adjusted utilizing at least one editable widget to form a modified set of data quality rules adapted for use with a specified application. The modified set of data quality rules is applied to the retrieved data.
Abstract:
According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.
Abstract:
Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
Abstract:
Embodiments of the invention provide data management solutions that go beyond the traditional warehousing system to support advanced analytics. Furthermore, embodiments of the invention relate to systems and methods for extracting data from an existing data warehouse, storing the extracted data in a reusable (intermediate) form using data parallel and compute parallel techniques over cloud, query processing over the data with/without compute parallel techniques, and providing querying using high level querying languages.
Abstract:
In a method for preventing information leakage in a workflow environment, a computer system receives a request to access documents in a repository. In one aspect, the computer system identifies articles in the document against the access credentials of the requestor. Further, the computer system extracts protected information from rows and columns in the articles based on label access controls. In another aspect, the computer system generates protected values in the extracted protected information from the rows and generating protected patterns in the extracted protected information from the columns. The computer system redacts the generated protected value and the generated protected patterns based on the access credentials of the requestor.
Abstract:
Computer software is disclosed for discovering and representing a reporting model of an existing reporting environment. For each report in a plurality of reports, the software searches metadata of the report for descriptive information and dependencies on other reports. The software depicts, in a graphical representation, each report and relationships between the reports.
Abstract:
According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.
Abstract:
A system and method for ratifying policies are provided. A method for ratifying a policy in a policy-based decision system comprises: determining if a new policy interacts with an existing policy in the policy-based decision system; and ratifying the new policy to exist in the policy-based decision system.
Abstract:
Methods and arrangements for extracting tuples from a streaming XML document. A query twig is applied to the XML document stream, tuples are extracted from the XML document stream based on the query twig, and a quantity of extracted tuples is limited via foregoing extraction of duplicate tuples extraction of tuples that do not satisfy query twig criteria.
Abstract:
Embodiments of the invention disclose a method, a system and a computer program product of discovering automated insights in XML data by generating a query result in response to querying data using a query, wherein the data is in a markup language format, and identifying a pattern associated with the query result, wherein the data in the markup language format is used for pattern identification.