Abstract:
A system includes storage of data of a hierarchy, where each node of the hierarchy is represented by a row, and each row includes a level of its respective node, a pointer to a lower bound entry of an order index structure associated with the hierarchy, and a pointer to an upper bound entry of the order index structure associated with the hierarchy, reception of a pointer l, and determination of an entry e of the order index structure to which the received pointer l points.
Abstract:
Techniques are described herein for performing database operations against location and access transparent metadata units called fat pointers organized into globally distributed data structures. The fat pointers are created by extracting values corresponding to a particular key and paring each value with a reference to the local location and server that has the native format record containing the value. The fat pointers may be transferred to any server in the cluster, even if the server is different from the server that has the native format record. In general, most operations are performed against fat pointers rather than the native format records. This allows the cluster to perform work against arbitrary types of data efficiently and in a constant amount of time despite the variable sizes and structures of records.
Abstract:
Mechanisms are provided for performing a matrix operation. A processor of a data processing system is configured to perform cluster-based matrix reordering of an input matrix. An input matrix, which comprises nodes associated with elements of the matrix, is received. The nodes are clustered into clusters based on numbers of connections with other nodes within and between the clusters, and the clusters are ordered by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix. A lookup table is generated identifying new locations of nodes of the input matrix, in the reordered matrix. A matrix operation is then performed based on the reordered matrix and the lookup table.
Abstract:
A multi-user search system with methodology for instant indexing. In one embodiment, for example, a system for instant indexing includes a token store storing sets of tokens for current versions of documents. The system further includes a tokenizer server configured to tokenize new versions of the documents and to generate sets of tokens for the new versions of the documents, an instant indexer configured to determine tokens to use to index the documents based on identified differences between the sets of tokens for the new versions of the documents and the sets of tokens for the current versions of the documents, and to generate index mutations including the tokens to use to index the documents, an index mutation journal configured to store the generated index mutations in association with timestamps, and an index mutation server configured to provide, to index servers, from the index mutation journal, generated index mutations for the index servers that are associated with timestamps that are newer than specified timestamps.
Abstract:
Electronic mail message processing includes: obtaining a set of keywords associated with an electronic mail message; updating, based at least in part on the set of keywords, a set of inverted index records stored in a level 1 cache; determining whether size of the set of inverted index records stored in the level 1 cache exceeds a first preset threshold value; in the event that the first preset threshold value is exceeded, transferring the set of inverted index records in the level 1 cache to a level 2 cache; determining whether size of a level 2 cache file exceeds a second preset threshold value; in the event that the second preset threshold value is exceeded, transferring, according to a path file, inverted index records in the level 2 cache file to a level 3 cache storing a set of inverted index files.
Abstract:
A method and system are provided for identifying type-ahead candidates. A method includes determining a context of past non-threaded emails of a user. The method further includes generating a context index associating the past non-threaded emails of the determined context with repeatable values within the past non-threaded emails. The method further includes receiving characters in a current email and determining a context of the current email. The method further includes determining matches between the current email and the past non-threaded in the context index. The method further includes identifying the corresponding repeatable values and matching the identified corresponding repeatable values with the received characters. The method further includes presenting the candidate words to the user for inclusion in the current email.
Abstract:
In one embodiment, one or more computing devices receive, from a client device of a first user, a query from the first user. The computer devices search a social graph to identify one or more nodes of the social graph that are relevant to the query. The computer devices obtain a static rank for each identified node. The static rank is based at least in part on a number of edges of a particular edge type that are connected to the node in the graph or attributes of edges connected to the node in the graph. The computer devices send to the client device of the first user for display, a search-results page responsive to the received query. The search-results page includes reference to one or more nodes having a static rank greater than a threshold rank.
Abstract:
Technology is disclosed for a multi-tiered querying system to target queries to systems storing data relevant to the query. A multi-tiered targeted query system comprises at least three tiers: a web tier, an aggregator tier, and a shards tier. Servers at the web tier can be configured to service user data requests and pass them to servers at the aggregator tier. Servers at the aggregator tier can be configured to determine which selected shard servers have the requested information; formulate queries for the selected shard servers; send the queries to the selected shard servers; and aggregate results from the selected shard servers. Servers at the shard tier can be configured to store data, receive queries on that data, and return results for received queries.
Abstract:
A multi-user search system with methodology for personal searching. In one embodiment, for example, a system for personal searching includes a plurality of index servers storing a plurality of index shards. Each index shard of the plurality of index shards indexes a plurality of documents. Each document of the plurality of documents belongs to one of a plurality of document namespaces assigned to the index shard. The system further includes a front-end server computer for receiving a search query from an authenticated user; an access control server for determining an authorized document namespace the authenticated user is authorized to access; and a query processor for answering the search query and restricting, based on an identifier of the authorized document namespace, an answer to the search query to identifying only documents satisfying the search query and belonging to the authorized document namespace.
Abstract:
A multi-user search system with methodology for instant indexing. In one embodiment, for example, a system for instant indexing includes a token store storing sets of tokens for current versions of documents. The system further includes a tokenizer server configured to tokenize new versions of the documents and to generate sets of tokens for the new versions of the documents, an instant indexer configured to determine tokens to use to index the documents based on identified differences between the sets of tokens for the new versions of the documents and the sets of tokens for the current versions of the documents, and to generate index mutations including the tokens to use to index the documents, an index mutation journal configured to store the generated index mutations in association with timestamps, and an index mutation server configured to provide, to index servers, from the index mutation journal, generated index mutations for the index servers that are associated with timestamps that are newer than specified timestamps.