Information retrieval system and method for displaying and ordering
information based on query element contribution
    1.
    发明授权
    Information retrieval system and method for displaying and ordering information based on query element contribution 失效
    基于查询元素贡献的信息检索系统和显示和排序信息的方法

    公开(公告)号:US5826260A

    公开(公告)日:1998-10-20

    申请号:US570149

    申请日:1995-12-11

    IPC分类号: G06F17/30

    摘要: In an information retrieval system, a query issued by the user is analyzed by a query engine into query elements. After the query has been evaluated against the document collections, a resulting hit list is presented to the user, e.g., as a table. The presented hit list displays not only an overall rank of a document but also a contribution of each query element to the rank of the document. The user can reorder the hit list by prioritizing the contribution of individual query elements to override the overall rank and by assigning additional weight(s) to those contributions.

    摘要翻译: 在信息检索系统中,用户发出的查询由查询引擎分析成查询元素。 在针对文档集合评估查询之后,将结果命中列表呈现给用户,例如作为表格。 所提供的命中列表不仅显示文档的总体等级,还显示每个查询元素对文档等级的贡献。 用户可以通过对各个查询元素的贡献进行优先排序来重新排序命中列表,以覆盖总体等级,并通过为这些贡献分配附加权重。

    Using canonical forms to develop a dictionary of names in a text
    2.
    发明授权
    Using canonical forms to develop a dictionary of names in a text 失效
    使用规范形式在文本中开发名称字典

    公开(公告)号:US5832480A

    公开(公告)日:1998-11-03

    申请号:US678929

    申请日:1996-07-12

    IPC分类号: G06F17/30

    摘要: Descriptive canonical forms of entity types are created by scanning one or more documents in a database of a computer system to identify one or more proper names that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names. Anchor names are selected from the list that unambiguously represent an entity type. The anchor names have one or more entity-type attribute values. Variant names, clean and split candidate names having one or more shared attribute (values) with the anchor name, are combined with the anchor name to create an equivalence group of names that refer to the same entity. A canonical form is generated for the group from a subset of the anchor name attributes. A canonical form is created in this manner for all of the clean and split candidate names on the list.

    摘要翻译: 实体类型的描述性规范形式是通过扫描计算机系统的数据库中的一个或多个文档来识别作为原始名称出现在文档中的一个或多个专有名称来创建的。 每个原始名称具有零个或多个专有名称,零个或多个内部子字符串,零个或多个前导子字符串以及零个或多个尾部子字符串。 一个或多个文件的原始名称被“清理”和“拆分”,直到不再满足某些“清理和拆分条件”以获得清洁和拆分候选名称的列表。 从明确表示实体类型的列表中选择锚点名称。 锚点名称具有一个或多个实体类型属性值。 具有一个或多个具有锚名称的共享属性(值)的变体名称,干净的和分离的候选名称与锚名称组合以创建引用同一实体的名称的等价组。 从锚名称属性的子集中为该组生成规范表单。 以这种方式创建一个规范表格,用于列表中的所有干净和分离的候选名称。

    Hybrid search
    3.
    发明授权
    Hybrid search 失效
    混合搜索

    公开(公告)号:US5809496A

    公开(公告)日:1998-09-15

    申请号:US802510

    申请日:1997-02-20

    IPC分类号: G06F17/30

    摘要: A method is described for a computerized search for words in an electronic database with a large number of documents stored in memory. With this method, a Boolean retrieval method is used to determine in which of a large number of documents an initial word meets a Boolean condition. A probabilistic retrieval method is then used to determine in which of the documents fulfilling the Boolean condition, the relevance of appearance of a second word exceeds a specified value. The two retrieval methods use different indexes for this. The disadvantages normally found with this are avoided by the two different indexes having a common element that can be processed by both retrieval methods.

    摘要翻译: 描述了一种用于在具有存储在存储器中的大量文档的电子数据库中的计算机化搜索的方法。 使用这种方法,使用布尔检索方法来确定大量文档中的哪一个初始字符合布尔条件。 然后使用概率检索方法来确定满足布尔条件的文档中的哪一个,第二个字的外观的相关性超过指定值。 两种检索方法为此使用不同的索引。 通过这种情况通常发现的缺点由具有可由两种检索方法处理的公共元素的两个不同索引来避免。