System and method for finding the most likely answer to a natural language question
    1.
    发明授权
    System and method for finding the most likely answer to a natural language question 有权
    发现自然语言问题最可能的答案的系统和方法

    公开(公告)号:US08340955B2

    公开(公告)日:2012-12-25

    申请号:US12110481

    申请日:2008-04-28

    IPC分类号: G06F17/27

    CPC分类号: G06F17/30654 G06F17/2785

    摘要: Automated question answering is disclosed that relates to the selection of an answer to a question from a pool of potential answers which are manually or automatically extracted from a large collection of textual documents. The a feature extraction component, a feature combination component, an answer selection component, and an answer presentation component, among others, are included. The input to the system is a set of one or more natural language questions and a collection of textual document The output is a (possibly ranked) set of factual answers to the questions, these answers being extracted from the document collection.

    摘要翻译: 公开了涉及从大量文本文档手动或自动提取的潜在答案池中选择对问题的回答的自动问答。 包括特征提取组件,特征组合组件,应答选择组件和应答呈现组件等。 对系统的输入是一组一个或多个自然语言问题和文本文档的集合输出是对问题的(可能排名的)一组事实答案,这些答案是从文档集合中提取出来的。

    System and method for hierarchically grouping and ranking a set of
objects in a query context based on one or more relationships
    2.
    发明授权
    System and method for hierarchically grouping and ranking a set of objects in a query context based on one or more relationships 失效
    用于基于一个或多个关系对查询语境中的一组对象进行分层分组和排序的系统和方法

    公开(公告)号:US5875446A

    公开(公告)日:1999-02-23

    申请号:US804599

    申请日:1997-02-24

    IPC分类号: G06F17/30

    摘要: Topically relevant objects in an object database are first identified using any generally known methods to obtain a set of topically relevant objects (topically relevant set). Parents, and in alternative embodiments other ancestors, of one or more of the topically relevant objects are identified according to directional structural relationships that the parents have with respect to the topically relevant objects. These objects form a set of structurally relevant objects (structurally relevant set). In some embodiments, the user query identifies one or more of these structural relationships. The topically relevant objects are then organized under one or more of their respective parents to form a hierarchy level of both (topically relevant and structurally relevant) sets of objects. In some preferred embodiments, the process can iterate to create more than one hierarchy level.

    摘要翻译: 首先使用任何通常已知的方法来识别对象数据库中的局部相关对象以获得一组局部相关对象(局部相关集合)。 根据父母对局部相关对象的有向结构关系,识别一个或多个局部相关对象的父母以及其他祖先的其他祖先。 这些对象形成一组结构相关的对象(结构相关的集合)。 在一些实施例中,用户查询识别这些结构关系中的一个或多个。 然后将这些局部相关的对象组织在它们各自的父母中的一个或多个下面,以形成对象的两个(局部相关的和结构上相关的)对象的层级。 在一些优选实施例中,该过程可以迭代以创建多于一个层级。

    System, method and program product for answering questions using a search engine
    3.
    发明授权
    System, method and program product for answering questions using a search engine 有权
    使用搜索引擎回答问题的系统,方法和程序产品

    公开(公告)号:US06665666B1

    公开(公告)日:2003-12-16

    申请号:US09495645

    申请日:2000-02-01

    IPC分类号: G06F1730

    摘要: The present invention is a system, method, and program product that comprises a computer with a collection of documents to be searched. The documents contain free form (natural language) text. We define a set of labels called QA-Tokens, which function as abstractions of phrases or question-types. We define a pattern file, which consists of a number of pattern records, each of which has a question template, an associated question word pattern, and an associated set of QA-Tokens. We describe a query-analysis process which receives a query as input and matches it to one or more of the question templates, where a priority algorithm determines which match is used if there is more than one. The query-analysis process then replaces the associated question word pattern in the matching query with the associated set of QA-Tokens, and possibly some other words. This results in a processed query having some combination of original query tokens, new tokens from the pattern file, and QA-Tokens, possibly with weights. We describe a pattern-matching process that identifies patterns of text in the document collection and augments the location with corresponding QA-Tokens. We define a text index data structure which is an inverted list of the locations of all of the words in the document collection, together with the locations of all of the augmented QA-Tokens. A search process then matches the processed query against a window of a user-selected number of sentences that is slid across the document texts. A hit-list of top-scoring windows is returned to the user.

    摘要翻译: 本发明是一种系统,方法和程序产品,其包括具有要搜索的文档的集合的计算机。 文件包含自由形式(自然语言)文本。 我们定义了一组称为QA-Tokens的标签,它们作为短语或问题类型的抽象。 我们定义一个模式文件,它由多个模式记录组成,每个模式记录都有一个问题模板,一个关联的问题单词模式和一组关联的质量检查标记。 我们描述一个查询分析过程,它接收一个查询作为输入并将其与一个或多个问题模板相匹配,其中优先级算法确定如果存在多个问题模板,则使用哪个匹配。 然后,查询分析过程将匹配查询中的相关问题词模式与相关的QA令牌集合以及可能的其他一些单词替换。 这导致处理的查询具有原始查询令牌,来自模式文件的新令牌和可能具有权重的QA令牌的某些组合。 我们描述了一种模式匹配过程,用于识别文档集合中的文本模式,并使用相应的QA-Token来增加位置。 我们定义一个文本索引数据结构,它是文档集合中所有单词的位置的反向列表,以及所有增强的质量检查令牌的位置。 然后,搜索过程将处理的查询与用户选择的句子数目的窗口匹配,该窗口在文档文本上滑动。 顶级评分窗口的命中列表将返回给用户。

    System and method for determining confidence levels for the results of a
categorization system
    4.
    发明授权
    System and method for determining confidence levels for the results of a categorization system 失效
    用于确定分类系统结果的置信水平的系统和方法

    公开(公告)号:US6003027A

    公开(公告)日:1999-12-14

    申请号:US976349

    申请日:1997-11-21

    IPC分类号: G06F17/30

    摘要: After a categorization process has been run, the scores of the top-two ranking categories along with the size or number of features in the object being categorized, are passed to a confidence assignment process. This determines a value for the confidence in the top category based on the evidence afforded by the input parameters. The magnitude of this confidence value will determine whether the system can accept the automatic categorization results, or whether human involvement is required. This invention also describes the process of determining the optimal value of an internal scaling parameter in the confidence assignment process. The construction of a threshold table based on this parameter is also described. The threshold table matches confidence values against error levels. For a given error rate the previously assigned confidence determines whether the categorization results can be accepted without need for human intervention. This invention maximizes the number of objects that can be automatically processed, for a given error rate.

    摘要翻译: 在分类过程已经运行之后,前两名排名类别的分数以及被分类对象中的特征的大小或数量被传递到置信度分配过程。 这基于输入参数提供的证据来确定顶级类别的置信度值。 该置信度值的大小将决定系统是否可以接受自动分类结果,还是需要人为参与。 本发明还描述了在置信度分配过程中确定内部缩放参数的最佳值的过程。 还描述了基于该参数的阈值表的构造。 阈值表匹配置信度值与错误级别。 对于给定的错误率,先前分配的置信度确定分类结果是否可以被接受而不需要人为干预。 对于给定的错误率,本发明使可以自动处理的对象的数量最大化。

    SYSTEM AND METHOD FOR FINDING THE MOST LIKELY ANSWER TO A NATURAL LANGUAGE QUESTION
    5.
    发明申请
    SYSTEM AND METHOD FOR FINDING THE MOST LIKELY ANSWER TO A NATURAL LANGUAGE QUESTION 有权
    用于发现自然语言问题的最有帮助的系统和方法

    公开(公告)号:US20080201132A1

    公开(公告)日:2008-08-21

    申请号:US12110481

    申请日:2008-04-28

    IPC分类号: G06F17/27

    CPC分类号: G06F17/30654 G06F17/2785

    摘要: Automated question answering is disclosed that relates to the selection of an answer to a question from a pool of potential answers which awe manually or automatically extracted from a large collection of textual documents. The a feature extraction component, a feature combination component, an answer selection component, and an answer presentation component, among others, are included. The input to the system is a set of one or more natural language questions and a collection of textual document The output is a (possibly ranked) set of factual answers to the questions, these answers being extracted from the document collection.

    摘要翻译: 披露了自动问题回答,其涉及从大量文本文档的手册或自动提取的潜在答案池中选择对问题的回答。 包括特征提取组件,特征组合组件,应答选择组件和应答呈现组件等。 对系统的输入是一组一个或多个自然语言问题和文本文档的集合输出是对问题的(可能排名的)一组事实答案,这些答案是从文档集合中提取出来的。

    System and method for categorizing objects in combined categories
    6.
    发明授权
    System and method for categorizing objects in combined categories 失效
    用于对组合类别中的对象进行分类的系统和方法

    公开(公告)号:US5943670A

    公开(公告)日:1999-08-24

    申请号:US976246

    申请日:1997-11-21

    IPC分类号: G06F17/30

    摘要: The present invention is a system and method for determining whether the best category for an object under investigation is a mixture of preexisting categories, and how the mixture is constituted. This invention is useful both for suggesting the need for new categories, and for a fixed set of categories, determining whether a document should be assigned to multiple categories. The objects of the categorization system are typically, but need not be, documents. Categorization may be by subject-matter, language or other criteria. The invention causes extra information to be stored in a category index, so that the determination of mixed categories using the methods presented here is performed extremely efficiently.

    摘要翻译: 本发明是用于确定被调查对象的最佳类别是否是预先存在的类别的混合以及如何构成混合物的系统和方法。 本发明对于建议对新类别的需要以及用于确定文档是否应被分配到多个类别的固定类别来说是有用的。 分类系统的对象通常是但不一定是文档。 分类可能是通过主题,语言或其他标准。 本发明使额外的信息存储在类别索引中,使得使用这里呈现的方法的混合类别的确定被非常有效地执行。

    Information retrieval system and method for displaying and ordering
information based on query element contribution
    7.
    发明授权
    Information retrieval system and method for displaying and ordering information based on query element contribution 失效
    基于查询元素贡献的信息检索系统和显示和排序信息的方法

    公开(公告)号:US5826260A

    公开(公告)日:1998-10-20

    申请号:US570149

    申请日:1995-12-11

    IPC分类号: G06F17/30

    摘要: In an information retrieval system, a query issued by the user is analyzed by a query engine into query elements. After the query has been evaluated against the document collections, a resulting hit list is presented to the user, e.g., as a table. The presented hit list displays not only an overall rank of a document but also a contribution of each query element to the rank of the document. The user can reorder the hit list by prioritizing the contribution of individual query elements to override the overall rank and by assigning additional weight(s) to those contributions.

    摘要翻译: 在信息检索系统中,用户发出的查询由查询引擎分析成查询元素。 在针对文档集合评估查询之后,将结果命中列表呈现给用户,例如作为表格。 所提供的命中列表不仅显示文档的总体等级,还显示每个查询元素对文档等级的贡献。 用户可以通过对各个查询元素的贡献进行优先排序来重新排序命中列表,以覆盖总体等级,并通过为这些贡献分配附加权重。

    System and Method for Finding the Most Likely Answer to a Natural Language Question
    8.
    发明申请
    System and Method for Finding the Most Likely Answer to a Natural Language Question 有权
    找到自然语言最可能的答案的系统和方法问题

    公开(公告)号:US20120189988A1

    公开(公告)日:2012-07-26

    申请号:US13438959

    申请日:2012-04-04

    IPC分类号: G09B19/00

    CPC分类号: G06F17/30654 G06F17/2785

    摘要: Automated question answering is disclosed that relates to the selection of an answer to a question from a pool of potential answers which are manually or automatically extracted from a large collection of textual documents. The a feature extraction component, a feature combination component, an answer selection component, and an answer presentation component, among others, are included. The input to the system is a set of one or more natural language questions and a collection of textual document. The output is a (possibly ranked) set of factual answers to the questions, these answers being extracted from the document collection.

    摘要翻译: 公开了涉及从大量文本文档手动或自动提取的潜在答案池中选择对问题的回答的自动问答。 包括特征提取组件,特征组合组件,应答选择组件和应答呈现组件等。 对系统的输入是一组一个或多个自然语言问题和文本文档的集合。 输出是对问题的(可能排名的)事实答案的集合,这些答案是从文档集合中提取出来的。

    Identifying duplicate documents from search results without comparing
document content
    9.
    发明授权
    Identifying duplicate documents from search results without comparing document content 失效
    从搜索结果中识别重复的文档,而不比较文档内容

    公开(公告)号:US5913208A

    公开(公告)日:1999-06-15

    申请号:US677059

    申请日:1996-07-09

    IPC分类号: G06F17/30

    摘要: A computer system has a document collection of one or more documents and one or more indexes that each include an inverted file with one or more terms. Each of the terms is associated with one or more document identifiers. The index further includes a document catalog that associates each of the document identifiers with one or more attributes, either intrinsic or non intrinsic. A search engine process produces a hit list having one or more hit list entries. Each hit list entry, with one or more hit list attributes, is associated with one of the documents that is determined by the search engine to be relevant to the query. A formatter processor selects one or more of the hit list attributes, identified by a hit list attribute selector and then compares the selected attributes of two or more entries on the hit list to determine whether or not documents associated with these entries are duplicate instances of one another. The determination can be made without examining the content of the document associated with the entries.

    摘要翻译: 计算机系统具有一个或多个文档的文档集合和一个或多个索引,每个索引包括具有一个或多个术语的反转文件。 每个术语都与一个或多个文档标识符相关联。 索引还包括将每个文档标识符与一个或多个属性(内在的或非固有的)相关联的文档目录。 搜索引擎过程产生具有一个或多个命中列表条目的命中列表。 具有一个或多个命中列表属性的每个命中列表条目与由搜索引擎确定为与查询相关的文档之一相关联。 格式器处理器选择由命中列表属性选择器识别的命中列表属性中的一个或多个,然后比较命中列表上的两个或多个条目的所选属性,以确定与这些条目相关联的文档是否是重复的一个实例 另一个。 可以在不检查与条目相关联的文档的内容的情况下进行确定。