摘要:
One embodiment of the invention provides a method for natural language processing (NLP). The method comprises extracting knowledge outside of text content of a NLP instance by extracting a set of subgraphs from a knowledge graph associated with the text content. The set of subgraphs comprises the knowledge. The method further comprises encoding the knowledge with the text content into a fixed size graph representation by filtering and encoding the set of subgraphs. The method further comprises applying a text embedding algorithm to the text content to generate a fixed size text representation, and classifying the text content based on the fixed size graph representation and the fixed size text representation.
摘要:
Embodiments of the present invention are directed to a computer-implemented method for generating a framework for analyzing adverse drug reactions. A non-limiting example of the computer-implemented method includes receiving to a processor, a plurality of drug chemical structures. The non-limiting example also includes receiving, to the processor, a plurality of known drug-adverse drug reaction associations. The non-limiting example also includes constructing, by the processor, a deep learning framework for each of a plurality of adverse drug reactions based at least in part upon the plurality of drug chemical structures and the plurality of known adverse-drug reaction associations.
摘要:
Embodiments of the present invention are directed to a computer-implemented method for generating a framework for analyzing adverse drug reactions. A non-limiting example of the computer-implemented method includes receiving to a processor, a plurality of drug chemical structures. The non-limiting example also includes receiving, to the processor, a plurality of known drug-adverse drug reaction associations. The non-limiting example also includes constructing, by the processor, a deep learning framework for each of a plurality of adverse drug reactions based at least in part upon the plurality of drug chemical structures and the plurality of known adverse-drug reaction associations.
摘要:
Generate, from a logical formula, a directed acyclic graph having a plurality of nodes and a plurality of edges. Assign an initial embedding to each mode and edge, to one of a plurality of layers. Compute a plurality of initial node states by using feed-forward networks, and construct cross-dependent embeddings between conjecture node embeddings and premise node embeddings. Topologically sort the DAG with the initial embeddings and node states. Beginning from a lowest rank, compute layer-by-layer embedding updates for each of the plurality of layers until a root is reached. Assign the embedding update for the root node as a final embedding for the DAG. Provide the final embedding for the DAG as input to a machine learning system, and carry out the automatic theorem proving with same.
摘要:
Various embodiments virtualize data across heterogeneous formats. In one embodiment, a plurality of heterogeneous data sources is received as input. A local schema graph including a set of attribute nodes and a set of type nodes is generated for each of the plurality of heterogeneous data sources. A global schema graph is generated based on each local schema graph that has been generated. The global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs. The edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes.
摘要:
Various embodiments of the invention relate to optimizing storage of schema-less data. At least one of a schema-less dataset including a plurality of resources one or more query workloads associated with the plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A schema is generated based on the graph that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph.
摘要:
A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster.
摘要:
A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster.
摘要:
A system for identifying a schema for storing graph data includes a database containing a graph dataset of data and relationships between data pairs and a list of storage methods that each are a distinct structural arrangement of the data and relationships from the graph data set. An analyzer module collects statistics for the graph dataset, and a data classification module uses the collected statistics to calculate metrics describing the data and relationships in the graph dataset, uses the calculated metrics to group the data and relationships into a plurality of graph dataset subsets and. associates each graph dataset subset with one of the plurality of storage methods. The resulting group of storage methods associated with the plurality of graph dataset subsets includes a unique storage method for each graph dataset subset. The data and relationships in each graph dataset subset are arranged in accordance with associated storage methods.
摘要:
Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.