Abstract:
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
Abstract:
The disclosure includes a system and method for providing a customized stream of content to a user. The system includes: an item sourcer for gathering one or more content items from one or more content sources; a behavior indicator module and scorer for determining one or more behavior scores for the one or more content items; a content indicator module and scorer for determining one or more content scores for the one or more content items; a score combiner for aggregating the one or more behavior scores and the one or more content scores to generate one or more item scores for the one or more content items; a content diversifier for determining one or more diverse items from the one or more content items; and a stream generator for generating a customized stream of content for the user from the one or more diverse items.
Abstract:
A method allocates object replicas in a distributed storage system. The method identifies a plurality of objects in the distributed storage system. Each object has an associated storage policy that specifies a target number of object replicas stored at distinct instances of the distributed storage system. The method identifies an object of the plurality of objects whose number of object replicas exceeds the target number of object replicas specified by the storage policy associated with the object. The method selects a first replica of the object for removal based on last access times for replicas of the object, and transmits a request to a first instance of the distributed storage system that stores the first replica. The request instructs the first instance to remove the first replica of the object.