Abstract:
The present teaching relates to searching encrypted data. In one example, a search request is received for encrypted documents. An encrypted query is generated based on the search request. The encrypted query is sent to a server that stores a first encrypted index and a second encrypted index. The first encrypted index maps encrypted keywords to full blocks each of which has a same size and is fully filled with encrypted document identities (IDs). The second encrypted index maps encrypted keywords to partial blocks each of which has the same size and is partially filled with encrypted document IDs. Based on the encrypted query, one or more encrypted document IDs are determined by searching against both the first encrypted index and the second encrypted index. A search result is generated based on the one or more encrypted document IDs. The search result is provided in response to the search request.
Abstract:
A method and apparatus for indexing routes using similarity hashing. In an embodiment, a processor identifies a route wherein the route includes one or more links. The processor identifies a route attribute wherein the route attribute describes the route. The processor hashes the one or more links to determine a minimum link with a minimum hash value. The processor assigns the route attribute to the minimum link.
Abstract:
The encoding apparatus registers, in a dynamic dictionary, strings in input text data that are not contained in a static dictionary. The encoding apparatus adds, to first hashed data obtained by individually N-dimensionally hashing words contained as registered items in the static dictionary, hashed data obtained by individually hashing strings registered in the dynamic dictionary. The encoding apparatus 100 determines, by using the first hashed data, whether each input string has been registered in the static dictionary 124 and whether the string has been registered in the dynamic dictionary 122. In accordance with the result of the determination, the encoding apparatus 100 performs encoding based on a content registered in the static dictionary or the dynamic dictionary.
Abstract:
Embodiments are disclosed for using an improved locality sensitive hashing (LSH) operation for the K-means clustering algorithm. In some embodiments, parameters of an LSH function are optimized with respect to a new cost model. In other embodiments, an LSH operation is applied with optimized parameters to a K-means clustering algorithm.
Abstract:
Indexing and retrieving real time content in a social networking system is disclosed. A user-term index includes user-term partitions, each user-term partition comprising temporal databases. As a post is received from a user, a user identifier, a post identifier, and a post is extracted. An object store communicatively coupled to a temporal database for recently received content is queried to determine whether terms in the post has already been stored. A term identifier is stored in the user-term index with the user and post identifiers. A forward index stores the post by post identifier. Responsive to a search query, the user-term index is searched by the user's connections and the terms. A real time search engine compiles the results of the user-term index query and retrieves the stored posts from the forward index. The search results may then be ranked and cached before presentation to the searching user.
Abstract:
According to an aspect, storing and querying conceptual indices (CIs) includes creating a conceptual inverted index (CII) from the CIs. The CII includes CII entries, each of which corresponds to a concept in a concept graph. Creating the CII includes populating each entry with pointers to documents selected from the CIs having likelihoods of being related to the concept that are greater than a threshold value, and the corresponding likelihoods. An aspect also includes receiving a query that includes a concept in the concept graph, and generating query results from a search that include at least a subset of the pointers to documents. Each of the CIs is associated with a corresponding document and includes a CI entry for each concept in the concept graph, and each of the CI entries specifies a value indicating a likelihood that the document is related to the concept in the concept graph.
Abstract:
In one embodiment, a node in a computer network joins a global ring associated with a distributed hash table (DHT), and maintains a DHT routing table and DHT database for the global ring. In addition, the node may determine a particular service class for which the node is configured, and may join a particular service-based sub-ring according to the particular service class, where all nodes of the particular service-based sub-ring are within the global ring. As such, a service-based DHT routing table and service-based DHT database may be maintained for the particular service-based sub-ring, such that DHT operations identified by the particular service class are routed to the particular service-based sub-ring (e.g., by a portal node).
Abstract:
A system and method for accessing a hash table are provided. A hash table includes buckets where each bucket includes multiple chains. When a single instruction multiple data (SIMD) processor receives a group of threads configured to execute a key look-up instruction that accesses an element in the hash table, the threads executing on the SIMD processor identify a bucket that stores a key in the key look-up instruction. Once identified, the threads in the group traverse the multiple chains in the bucket, such that the elements at a chain level in the multiple chains are traversed in parallel. The traversal continues until a key look-up succeeds or fails.
Abstract:
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
Abstract:
A version file for maintaining version information is described herein. The version file comprises an identifier to identify a target collection of associated files and a target index specifying binary level contents of the target collection of associated files to compare with a reference index specifying contents of a reference collection of associated files. The version file farther comprises a checksum generated based on the identifier and the target index.