Generating Search Result Summaries
    81.
    发明申请
    Generating Search Result Summaries 有权
    生成搜索结果摘要

    公开(公告)号:US20090198667A1

    公开(公告)日:2009-08-06

    申请号:US12023678

    申请日:2008-01-31

    CPC classification number: G06F17/30867 G06F17/30719

    Abstract: Embodiments are configured to provide a summary of information associated with one or more search results. In an embodiment, a system includes a summary generator that can be configured to provide a summary of information including one or more snippets associated with a search term or search terms. The system includes a ranking component that can be used to rank snippets and the ranked snippets can be used when generating a summary that includes one or more ranked snippets. In one embodiment, the system can be configured to include one or more filters that can be used to filter snippets and the filtered snippets can be used when generating a summary. Other embodiments are available.

    Abstract translation: 实施例被配置为提供与一个或多个搜索结果相关联的信息的摘要。 在一个实施例中,系统包括摘要生成器,其可以被配置为提供包括与搜索项或搜索项相关联的一个或多个片段的信息的摘要。 该系统包括可用于对片段进行排名的排名组件,并且可以在生成包含一个或多个排名片段的摘要时使用排名片段。 在一个实施例中,系统可被配置为包括一个或多个可用于过滤片段的过滤器,并且可以在生成摘要时使用经过过滤的片段。 其他实施例是可用的。

    Extraction of information from documents
    82.
    发明申请
    Extraction of information from documents 有权
    从文件中提取信息

    公开(公告)号:US20060277173A1

    公开(公告)日:2006-12-07

    申请号:US11192687

    申请日:2005-07-29

    CPC classification number: G06F17/211 Y10S707/99935

    Abstract: An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.

    Abstract translation: 对标示的培训文件中标识的格式特征进行信息提取模型的培训。 通过根据文档中单位的格式特征为单位分配标签来提取文档中的信息。 识别开始标签和结束标签,并在开始标签和结束标签之间提取信息。 提取的信息可以用于各种文档处理任务,如排名。

    Adaptive Web crawling using a statistical model
    83.
    发明申请
    Adaptive Web crawling using a statistical model 失效
    使用统计模型的自适应Web爬网

    公开(公告)号:US20050165778A1

    公开(公告)日:2005-07-28

    申请号:US11022054

    申请日:2004-12-22

    CPC classification number: G06F17/30864 Y10S707/99931 Y10S707/99933

    Abstract: A computer based system and method of retrieving information pertaining to documents on a computer network is disclosed. The method includes selecting a set of documents to be accessed during a Web crawl by utilizing a statistical model to determine which previously retrieved documents are most likely to have changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision made whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.

    Abstract translation: 公开了一种基于计算机的系统和检索与计算机网络上的文件有关的信息的方法。 该方法包括通过利用统计模型来选择要在Web爬行期间访问的一组文档,以确定先前检索到的文档最近可能自上次访问以来发生变化。 统计模型通过训练内部概率分布来不断提高其准确性,以反映所访问文件的变化率模式的实际经验。 是否访问文档的决定是基于与所需同步级别进行比较的更改概率,随机选择,自上次访问文档以来的时间量的最大限制以及其他标准。 一旦作出决定,将对文件进行更改检查,并将此信息用于训练统计模型。

    Scoping queries in a search engine
    84.
    发明申请

    公开(公告)号:US20050080779A1

    公开(公告)日:2005-04-14

    申请号:US10968716

    申请日:2004-10-19

    Abstract: Systems and methods for scoping a search. When a content index for electronic data is built, one or more scope restrictions are included in the content index. The scope restriction may be, for example, a root folder identifier, a mailbox identifier, or a URL. Because the scope restriction is included in the content index random access of the property store to determine the scope is avoided. Rather, the scope restriction is implicitly added to a search that uses the content index. By including a scope restriction in the search query, the search results identified from the content index are limited to results that match the scope restriction. Advantageously, the effect of including the scope restriction in the search is ignored if the search results are relatively small or when including the scope restriction provides little benefit.

    Method of web crawling utilizing address mapping
    85.
    发明授权
    Method of web crawling utilizing address mapping 失效
    利用地址映射的Web爬网方法

    公开(公告)号:US6145003A

    公开(公告)日:2000-11-07

    申请号:US992329

    申请日:1997-12-17

    Abstract: A computer-based system and method of retrieving information pertaining to Web documents on a computer network is disclosed. The method includes maintaining an address map that associates primary addresses with secondary addresses. A primary address includes a network retrieval protocol and a network address. The secondary address may include a different retrieval protocol or a different network address from the primary document address. A Web crawler retrieves a Web document using the primary document address, and determines whether the address map contains a secondary document address prefix corresponding to the primary document address prefix. If a secondary document address prefix exists, the Web crawler creates a secondary address, retrieves additional information pertaining to the Web document, and combines the additional information with the data retrieved from the Web document. The combined data may be stored in an index, and subsequently used to perform a document search.

    Abstract translation: 公开了一种在计算机网络上检索与Web文档有关的信息的基于计算机的系统和方法。 该方法包括维护将主地址与辅助地址相关联的地址映射。 主地址包括网络检索协议和网络地址。 次要地址可以包括与主要文档地址不同的检索协议或不同的网络地址。 Web爬网程序使用主文档地址检索Web文档,并确定地址映射是否包含与主文档地址前缀相对应的辅助文档地址前缀。 如果存在辅助文档地址前缀,则Web爬网程序将创建辅助地址,检索与Web文档有关的其他信息,并将其他信息与从Web文档检索的数据组合。 组合数据可以存储在索引中,并且随后用于执行文档搜索。

    Discovering expertise using document metadata in part to rank authors
    86.
    发明授权
    Discovering expertise using document metadata in part to rank authors 有权
    使用文档元数据发现专业知识,部分归功于作者

    公开(公告)号:US09589072B2

    公开(公告)日:2017-03-07

    申请号:US13150710

    申请日:2011-06-01

    CPC classification number: G06F17/30979

    Abstract: Expertise mining features are provided based in part on the use of an expertise mining algorithm and expertise mining queries. A method of an embodiment operates to provide an expanded feedback query based in part on search results using an expertise mining query and a number of author-ranking heuristics used to rank authors and/or co-authors (e.g., primary authors, secondary authors, etc.) as part of an expertise mining operation. A search system of an embodiment includes an author ranker component to rank authors based in part on an expertise mining query and author-ranking heuristics, and a query expander component to provide expanded queries as part of identifying relevant search results. Other embodiments are also disclosed.

    Abstract translation: 专业挖掘功能部分基于专业挖掘算法和专业挖掘查询的使用而提供。 实施例的方法用于使用专业知识挖掘查询和用于对作者和/或共同作者进行排名的多个作者排名启发法(例如,主要作者,次要作者, 等等)作为专业挖掘操作的一部分。 实施例的搜索系统包括作者角色组件,其部分地基于专业挖掘查询和作者排名启发式排序作者,以及查询扩展器组件,用于提供扩展查询作为标识相关搜索结果的一部分。 还公开了其他实施例。

    Detection of junk in search result ranking
    88.
    发明授权
    Detection of junk in search result ranking 有权
    在搜索结果排名中检测垃圾

    公开(公告)号:US08738635B2

    公开(公告)日:2014-05-27

    申请号:US12791756

    申请日:2010-06-01

    CPC classification number: G06F17/30 G06F17/00 G06F17/30657 G06F17/30864

    Abstract: Embodiments are directed to ranking search results using a junk profile. For a given corpus of documents, one or more junk profiles may be created and maintained. The junk profile provides reference metrics to represent known junk documents. For example, a junk profile may comprise a dictionary of document data that is automatically inserted into documents created using a particular system or template. A junk profile may also comprise one or more representations (e.g., histograms) of a distribution of a particular junk variable for known junk documents. The junk profile provides a usable representation of known junk documents, and the present systems and methods employ the junk profile to predict the likelihood that documents in the corpus are junk. In embodiments, junk scores are calculated and used to rank such documents higher or lower in response to a search query.

    Abstract translation: 实施例涉及使用垃圾简档对搜索结果进行排名。 对于给定的文档语料库,可以创建和维护一个或多个垃圾配置文件。 垃圾配置文件提供参考指标来表示已知的垃圾文档。 例如,垃圾简档可以包括自动插入到使用特定系统或模板创建的文档中的文档数据的字典。 垃圾简档还可以包括用于已知垃圾文档的特定垃圾变量的分布的一个或多个表示(例如直方图)。 垃圾简档提供已知垃圾文档的可用表示,并且本系统和方法使用垃圾简档来预测语料库中的文档是垃圾的可能性。 在实施例中,计算垃圾分数并用于响应于搜索查询而对这些文档进行更高或更低的排序。

    Custom ranking model schema
    89.
    发明授权
    Custom ranking model schema 有权
    自定义排名模式模式

    公开(公告)号:US08527507B2

    公开(公告)日:2013-09-03

    申请号:US12630981

    申请日:2009-12-04

    CPC classification number: G06F17/30675

    Abstract: A customizable ranking model of a search engine using custom ranking model configuration and parameters of a pre-defined human-readable format. The architecture can employ a markup language schema to represent the custom ranking model. In one implementation, the schema developed utilizes XML (extensible markup language) for representing the custom ranking model. Weights for dynamic and static relevance ingredients can be altered per ranking model and new relevance ingredients can be added. Additionally, features are provided for improving relevance such as adding terms to a thesaurus for synonym expansion, for example, the ability to deal with single terms either as compounds, and/or using custom word breaking rules.

    Abstract translation: 使用自定义排名模型配置和预定义的人类可读格式的参数的可定制的搜索引擎排名模型。 该架构可以采用标记语言模式来表示自定义排名模型。 在一个实现中,开发的模式利用XML(可扩展标记语言)来表示自定义排名模型。 动态和静态相关成分的重量可以根据排名模型更改,并可添加新的相关成分。 另外,为提高相关性提供了相关性的功能,例如在同义词扩展的词库中添加术语,例如,将单词作为化合物处理的能力,和/或使用自定义单词断开规则。

    Default Query Rules
    90.
    发明申请
    Default Query Rules 审中-公开
    默认查询规则

    公开(公告)号:US20130110816A1

    公开(公告)日:2013-05-02

    申请号:US13287999

    申请日:2011-11-02

    CPC classification number: G06F16/90335 G06F16/9038

    Abstract: Systems and methods for reformulating an initial search query and presenting query results in a logical and user-friendly manner. Enterprise queries are detected and automatically reformulated such that a user need not have any knowledge of how to reformulate a particular query. Query results are formatted and presented such that standard browsing behavior of the user is not substantially altered. The user is made aware of how the query has been reformulated, and how to get more results of that type without changing their gaze patterns.

    Abstract translation: 用于重新组合初始搜索查询并以逻辑和用户友好的方式呈现查询结果的系统和方法。 检测到企业查询并自动重新设计,使得用户不需要知道如何重新格式化特定查询。 查询结果被格式化和呈现,使得用户的标准浏览行为基本上没有被改变。 使用者知道查询如何重新设计,以及如何获取更多的结果,而不改变它们的凝视模式。

Patent Agency Ranking