摘要:
A vectorization process is employed in which chemical identifier strings are converted into respective vectors. These vectors may then be searched to identify molecules that are identical or similar to each other. The dimensions of the vector space can be defined by sequences of symbols that make up the chemical identifier strings. The International Chemical Identifier (InChI) string defined by the International Union of Pure and Applied Chemistry (IUPAC) is particularly well suited for these methods.
摘要:
A method, system, and article for improving performance of a Boolean combination of at least two filters to a data stream. Stream processing is applied to an expression having to or more logical operators. As the data stream is processed, efficiency of the operators in the expression is evaluated. A sort algorithm is dynamically invoked to ensure that a more efficient operator precedes processing of a less efficient operator.
摘要:
Methods, systems and computer program products for simplifying complex data stream problems involving feature extraction from noisy data. Exemplary embodiments include a method for processing a data stream, including applying multiple operators to the data stream, wherein an operation by each of the multiple operators includes retrieving the next chunk for each of set of input parameters, performing digital processing operations on a respective next chunk, producing sets of output parameters and adding data to one or more internal data stores, each internal data store acting as a data stream source.
摘要:
Similarities among multiple near-neighbor objects are searched for based on multiple criteria. A query is received for an object closest to an object provided by a user, and weights are assigned by a user to distance functions among the multiple objects at the time of the query. Each distance function represents a different criterion. The weighted average is calculated for the distance functions, and the closest object to the query object based on the weighted average for the distance functions.
摘要:
A vectorization process is employed in which chemical identifier strings are converted into respective vectors. These vectors may then be searched to identify molecules that are identical or similar to each other. The dimensions of the vector space can be defined by sequences of symbols that make up the chemical identifier strings. The International Chemical Identifier (InChI) string defined by the International Union of Pure and Applied Chemistry (IUPAC) is particularly well suited for these methods.
摘要:
A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table.
摘要:
A vectorization process is employed in which chemical identifier strings are converted into respective vectors. These vectors may then be searched to identify molecules that are identical or similar to each other. The dimensions of the vector space can be defined by sequences of symbols that make up the chemical identifier strings. The International Chemical Identifier (InChI) string defined by the International Union of Pure and Applied Chemistry (IUPAC) is particularly well suited for these methods.
摘要:
A computer-implemented method for managing price information. Embodiments include receiving a mapping of interconnected components, identifying as a first subset components subject to a first fixed price agreement not subject to a second fixed price agreement that overlaps the first fixed price agreement, identifying as a second subset the components subject to the second fixed price agreement not subject to the first fixed price agreement, and identifying as a third subset the components subject to both the first fixed price agreement and the second fixed price agreement. The method also includes receiving a price change for a price associated with a component in one of the subsets of components, and distributing an offset of the price change to components in the other subsets of components.
摘要:
A method for analyzing predefined subject matter in a patent database being for use with a set of target patents, each target patent related to the predefined subject matter, the method comprising: creating a feature space based on frequently occurring terms found in the set of target patents; creating a partition taxonomy based on a clustered configuration of the feature space; editing the partition taxonomy using domain expertise to produce an edited partition taxonomy; creating a classification taxonomy based on structured features present in the edited partition taxonomy; creating a contingency table by comparing the edited partition taxonomy and the classification taxonomy to provide entries in the contingency table; and identifying all significant relationships in the contingency table to help determine the presence of any white space.
摘要:
Methods, systems and computer program products for simplifying complex data stream problems involving feature extraction from noisy data. Exemplary embodiments include a method for processing a data stream, including applying multiple operators to the data stream, wherein an operation by each of the multiple operators includes retrieving the next chunk for each of set of input parameters, performing digital processing operations on a respective next chunk, producing sets of output parameters and adding data to one or more internal data stores, each internal data store acting as a data stream source.