Abstract:
A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.
Abstract:
Methods and systems for utilizing a database are disclosed. The methods and systems determine a key representative of a storage location of first RDF data in a NoSQL database. In addition, the methods and systems read the first RDF data in the NoSQL database using the key. The methods and systems also write second RDF data derived from the first RDF data into a second database stored in memory. The methods and systems may also modify the second RDF data, and write third RDF data derived from the modified second RDF data into the NoSQL database.
Abstract:
A search system can maintain a search index of metadata and text for objects in a repository, repositories or distributed across a network. The search index can be divided into partitions with a partition assigned a first capacity utilization threshold and a second capacity utilization threshold. If the capacity utilization of the partition is below the first threshold, the system can add, update and delete information in the partition. If the capacity utilization of the partition is above the first threshold, the system can update and delete information in the partition, but cannot add information for new objects to the partition. If the capacity utilization of the partition is above the second threshold, the system can enter a rebalancing mode in which it seeks to rebalance capacity utilization between partitions. The behavior of the system can change depending upon the size of a partition relative to its configurable thresholds.
Abstract:
A system for securing application information in a shared, system-wide search service. Each application can register a security filtering module that is to be used at search time to filter data associated with that application. When a user performs a search, initial, unfiltered search results are obtained based the contents of the shared search index. The unfiltered search results are organized by application, and previously registered filter modules are called to perform user specific, per-application filtering on the initial results. The filter modules cause data to which the user issuing the search request does not have access to be removed from the search results, on a per application basis. Those of the initial search results that are determined in this way to not be accessible to the user issuing the search request are removed, resulting in a set of filtered search results that are presented to the user. The filtered search results thus contain indications only of data that is accessible to the user. In this way, the system-wide search service filters search results to remove indications of data which match the search criteria provided by the user, but to which the user does not have access, based on a conveniently extensible, per-application search result filtering process.
Abstract:
A method for efficiently storing index data on an electronic device may include storing index data in data pages within an indexing data structure on the electronic device. The method may also include providing at least one directory for the indexing data structure. The method may also include dynamically modifying how many directory levels are provided for the indexing data structure in response to changes to the data pages within the indexing data structure.
Abstract:
During a storage technique, multiple messages (such as emails) associated with a user of a communication application are received. Then, the multiple messages are stored in a message table associated with the user and the multiple messages are indexed in an index associated with the user. This index may be divided into multiple divisions if a total number of messages stored in the message table exceeds a threshold value, where each division corresponds to messages received during a different time interval.
Abstract:
Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.
Abstract:
A system and method for maintaining version information. An identifier (“ID”) that identifies a collection of associated files is obtained. An index is generated that specifies the contents of the collection of associated files. The ID may be saved along with the index in a target version file to convey version information about the collection of associated files. Subsequently, the index may be extracted from the target version file to compare with a corresponding index extracted from a reference version file. The result of the comparison may be used to determine whether the contents of the collection of associated files match a reference.
Abstract:
Provided is a method that includes a method for updating index data. The method includes receiving index data, including an index value indicative of user activity on a network site and an index time corresponding to a time used for calculating the index value, receiving an update index time corresponding to a time used for updating the index data, determining an updated index value using an exponential decay of the index value from the index time to the update index time, wherein the updated index value comprises a decayed value of the index value corresponding to the update time, and storing updated index data including the updated index value and the update index time.
Abstract:
Indexing and retrieving real time content in a social networking system is disclosed. A user-term index includes user-term partitions, each user-term partition comprising temporal databases. As a post is received from a user, a user identifier, a post identifier, and a post is extracted. An object store communicatively coupled to a temporal database for recently received content is queried to determine whether terms in the post has already been stored. A term identifier is stored in the user-term index with the user and post identifiers. A forward index stores the post by post identifier. Responsive to a search query, the user-term index is searched by the user's connections and the terms. A real time search engine compiles the results of the user-term index query and retrieves the stored posts from the forward index. The search results may then be ranked and cached before presentation to the searching user.