Abstract:
In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.
Abstract:
A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.
Abstract:
In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.
Abstract:
Tools and techniques for indexing and querying data stores using concatenated terms are provided. These tools may receive input queries that include at least two query terms. The query terms are correlated respectively with fields contained within records within a data store, with these fields being populated with respective field values. The query terms are arranged according to an indexing priority according to which the fields are ranked within an indexing table, which is associated with the data store. The tools then concatenate the query terms as arranged according to the indexing priority. In turn, the tools search the index table for any entries that are responsive to the concatenated query terms
Abstract:
A system and method is disclosed for generating numerical index terms for numbers encountered in documents indexed by a search engine. The numerical index terms include information about the indexed number (e.g., fieldname, characteristic, sign) and each digit, or a subset of the digits, of the number (e.g., position, value). Also, disclosed is a system and method of processing number-range search queries having one or more number ranges and generating expressions (e.g., Boolean expression tree) of numerical index terms based on a boundary number associated with the number range. An expression is used to control the search of a document index so as to identify documents that contain numbers that satisfy the expression.
Abstract:
A customized, specialty-oriented database and index, of a subject matter area and methods for constructing and using such a database are provided. Selection and indexing of articles is done by experts in the topic with which the database is concerned. As a result, articles are indexed in a manner that allows facile, rapid retrieval of highly relevant articles with few or no false positives with much reduced database maintenance cost through frugal limitation of number of documents in the database, number of terms in a Master Index, and number of codes assigned to each document. A thesaurus allows indexing and search in accordance with terminology familiar to different anticipated groups of users (e.g. doctors, patients, nurses, technicians, and the like). Key articles collections and rapid access to documents therein are also provided.
Abstract:
A written document (hereinafter referred to as a “work,” on electronic format which includes, stories, novels, education texts, biographies, compilations, collections, anthologies, tracts, and any other traditional format for relatively extensive texts) provides access to reference, bibliography and/or definition material through an electronic software capability associated with the work. Depending upon reader access information or characteristics (e.g., age, grade, proficiency, or position within the work or any other identifiable reader characteristic or access limitation), any request for reference material, definitions, explanations, translations, or other material provided in the associated software capability is automatically limited by system acknowledgement of the reader access information or characteristics. As the reader's access information or characteristics change, the quality and/or quantity and/or format of requested information with respect to a work changes.
Abstract:
A system of generating an index for a retrieval of data provided by at least one document is disclosed. The method and system comprise selecting data within the at least one document, assigning a category to the selected data, and assigning a timestamp to the selected data. The method and system further includes storing the selected data, the category, the timestamp and a location indication of the selected data as an entry of the index. The present invention therefore provides an effective and universally adaptive tool for contextual structuring and retrieval of data distributed over a plurality of electronic documents.
Abstract:
A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.
Abstract:
A system for securing application information in a shared, system-wide search service. Each application can register a security filtering module that is to be used at search time to filter data associated with that application. When a user performs a search, initial, unfiltered search results are obtained based the contents of the shared search index. The unfiltered search results are organized by application, and previously registered filter modules are called to perform user specific, per-application filtering on the initial results. The filter modules cause data to which the user issuing the search request does not have access to be removed from the search results, on a per application basis. Those of the initial search results that are determined in this way to not be accessible to the user issuing the search request are removed, resulting in a set of filtered search results that are presented to the user. The filtered search results thus contain indications only of data that is accessible to the user. In this way, the system-wide search service filters search results to remove indications of data which match the search criteria provided by the user, but to which the user does not have access, based on a conveniently extensible, per-application search result filtering process.