摘要:
In an information retrieval system, a query issued by the user is analyzed by a query engine into query elements. After the query has been evaluated against the document collections, a resulting hit list is presented to the user, e.g., as a table. The presented hit list displays not only an overall rank of a document but also a contribution of each query element to the rank of the document. The user can reorder the hit list by prioritizing the contribution of individual query elements to override the overall rank and by assigning additional weight(s) to those contributions.
摘要:
Descriptive canonical forms of entity types are created by scanning one or more documents in a database of a computer system to identify one or more proper names that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names. Anchor names are selected from the list that unambiguously represent an entity type. The anchor names have one or more entity-type attribute values. Variant names, clean and split candidate names having one or more shared attribute (values) with the anchor name, are combined with the anchor name to create an equivalence group of names that refer to the same entity. A canonical form is generated for the group from a subset of the anchor name attributes. A canonical form is created in this manner for all of the clean and split candidate names on the list.
摘要:
A method is described for a computerized search for words in an electronic database with a large number of documents stored in memory. With this method, a Boolean retrieval method is used to determine in which of a large number of documents an initial word meets a Boolean condition. A probabilistic retrieval method is then used to determine in which of the documents fulfilling the Boolean condition, the relevance of appearance of a second word exceeds a specified value. The two retrieval methods use different indexes for this. The disadvantages normally found with this are avoided by the two different indexes having a common element that can be processed by both retrieval methods.