Extracting query dimensions from search results

    公开(公告)号:US09785704B2

    公开(公告)日:2017-10-10

    申请号:US13343621

    申请日:2012-01-04

    Abstract: Techniques are described for automatically mining query dimensions from web pages resulting from execution of a search query. Lists of items such as words, terms, or phrases are extracted from the web pages based on the recognition of free text, metadata tag, or repeated region patterns within the web page text. Extracted item lists are weighted according to document matching and/or inverse document frequency, and item lists are clustered based on shared or similar items within the lists to generate query dimensions. The generated query dimensions, and the items within each query dimension, are ranked according to quality, and high-quality query dimensions are provided for display alongside top search results.

    Scoring relevance of a document based on image text
    2.
    发明授权
    Scoring relevance of a document based on image text 有权
    基于图像文本评估文档的相关性

    公开(公告)号:US08645370B2

    公开(公告)日:2014-02-04

    申请号:US12972259

    申请日:2010-12-17

    CPC classification number: G06F17/30864 G06F17/30265

    Abstract: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.

    Abstract translation: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。

    Interactive framework for name disambiguation
    3.
    发明授权
    Interactive framework for name disambiguation 有权
    互动框架的名称消歧

    公开(公告)号:US08538898B2

    公开(公告)日:2013-09-17

    申请号:US13118404

    申请日:2011-05-28

    CPC classification number: G06N99/005 G06F17/30616

    Abstract: A “Name Disambiguator” provides various techniques for implementing an interactive framework for resolving or disambiguating entity names (associated with objects such as publications) for entity searches where two or more same or similar names may refer to different entities. More specifically, the Name Disambiguator uses a combination of user input and automatic models to address the disambiguation problem. In various embodiments, the Name Disambiguator uses a two part process, including: 1) a global SVM trained from large sets of documents or objects in a simulated interactive mode, and 2) further personalization of local SVM models (associated with individual names or groups of names such as, for example, a group of coauthors) derived from the global SVM model. The result of this process is that large sets of documents or objects are rapidly and accurately condensed or clustered into ordered sets by that are organized by entity names.

    Abstract translation: “名称歧义者”提供了各种技术,用于实现用于解析或消除实体名称(与诸如出版物的对象相关联)的交互式框架,用于实体搜索,其中两个或多个相同或相似的名称可以指代不同的实体。 更具体地说,名称消歧器使用用户输入和自动模型的组合来解决消歧问题。 在各种实施例中,名称消歧器使用两部分过程,包括:1)以模拟交互模式从大量文档或对象训练的全局SVM,以及2)本地SVM模型的进一步个性化(与个体名称或组相关联 来自全球SVM模型的名称,例如一组合作者。 这个过程的结果是,大量的文档或对象可以通过按实体名称组织的快速,准确的浓缩或聚类成有序集。

    Using anchor text with hyperlink structures for web searches
    4.
    发明授权
    Using anchor text with hyperlink structures for web searches 有权
    使用锚文本与超链接结构进行网页搜索

    公开(公告)号:US08380722B2

    公开(公告)日:2013-02-19

    申请号:US12748903

    申请日:2010-03-29

    CPC classification number: G06F17/30887

    Abstract: This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches. Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.

    Abstract translation: 本文档描述了调整锚文本权重以提供更相关的搜索引擎结果的工具。 具体来说,这些工具利用站点关系模型来考虑不仅锚文本源站点和目标页面之间的关系,还考虑多个锚文本源站点之间的关系,以改进Web搜索。 考虑这些关系有助于确定新的锚文本权重,这又导致更相关的搜索结果。

    Finite-state model for processing web queries
    5.
    发明授权
    Finite-state model for processing web queries 失效
    用于处理Web查询的有限状态模型

    公开(公告)号:US08024319B2

    公开(公告)日:2011-09-20

    申请号:US11698011

    申请日:2007-01-25

    CPC classification number: G06F17/30864

    Abstract: A method of creating an index of web queries is discussed. The method includes receiving a first query representative of one or more symbolic characters and assigning the first query to a first data structure. A first text string representative of the first query is created and assigned to a second data structure. The first and second data structures are stored on a tangible computer readable medium.

    Abstract translation: 讨论了创建Web查询索引的方法。 该方法包括接收表示一个或多个符号字符的第一查询,并将第一查询分配给第一数据结构。 创建表示第一查询的第一文本串并将其分配给第二数据结构。 第一和第二数据结构存储在有形的计算机可读介质上。

    Assessing mobile readiness of a page using a trained scorer
    6.
    发明授权
    Assessing mobile readiness of a page using a trained scorer 有权
    使用训练有素的得分手评估页面的移动就绪状态

    公开(公告)号:US07974957B2

    公开(公告)日:2011-07-05

    申请号:US11697134

    申请日:2007-04-05

    CPC classification number: G06F17/30864

    Abstract: A method and system for ranking pages of a search result based on the mobile readiness of the pages is provided. A mobile-readiness system receives an indication of pages that are to be ranked. The mobile-readiness system evaluates the mobile readiness for each of the pages. Mobile readiness indicates suitability of the page for a mobile device. The mobile readiness system then ranks the pages based on the generated mobile readiness and some other criterion such as a relevance score or an importance score. The mobile-readiness system may train a classifier to classify pages based on their mobile readiness.

    Abstract translation: 提供了一种基于页面的移动准备来对搜索结果的页面进行排名的方法和系统。 移动就绪系统接收要排名的页面的指示。 移动就绪系统评估每个页面的移动准备状态。 移动就绪表示移动设备页面的适用性。 然后,移动准备系统基于生成的移动准备状态和诸如相关性得分或重要性得分之类的其他标准对页面进行排序。 移动就绪系统可以训练分类器基于其移动准备就分类页面。

    Hierarchical conditional random fields for web extraction
    7.
    发明授权
    Hierarchical conditional random fields for web extraction 失效
    Web提取的分层条件随机字段

    公开(公告)号:US07720830B2

    公开(公告)日:2010-05-18

    申请号:US11461400

    申请日:2006-07-31

    CPC classification number: G06F17/3089 G06F17/30994

    Abstract: A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.

    Abstract translation: 提供了一种用于标记信息页面的对象信息的方法和系统。 标签系统基于对象记录中的对象元素的标签来识别信息页面的对象记录,并且基于包含对象元素的对象记录的标识来标记对象元素。 为了识别记录并标记元素,标签系统生成信息页的块的分层表示。 标签系统通过块的层次传播记录标签和元素标签的概率相关信息来识别记录中的记录和元素。 标签系统为每个块生成特征向量以表示块,并且基于从与相关块相关联的特征向量导出的分数来计算块正确的标签的概率。 标签系统搜索具有最高准确概率的记录和元素的标签。

    Determining relevance of documents to a query based on identifier distance
    8.
    发明授权
    Determining relevance of documents to a query based on identifier distance 有权
    根据标识符距离确定文档与查询的相关性

    公开(公告)号:US07630964B2

    公开(公告)日:2009-12-08

    申请号:US11273624

    申请日:2005-11-14

    CPC classification number: G06F17/30864 G06F17/3069 Y10S707/99933

    Abstract: A method and system for determining relevance of a document to a query based on identifier match distance is provided. The relevance system analyzes a training set of queries and documents to determine the relationship between identifier match distance and relevance of a document to a query. The identifier match distance indicates the distance from the end of an identifier of a document to an identifier term that matches a query term. The relevance system generates a prior relevance probability that a document with a certain identifier match distance is relevant to a query. The relevance system uses the prior relevance probabilities to determine relevance of documents to queries based on identifier match distance.

    Abstract translation: 提供了一种用于基于标识符匹配距离来确定文档与查询的相关性的方法和系统。 相关系统分析查询和文档的训练集,以确定标识符匹配距离与文档与查询的相关性之间的关系。 标识符匹配距离指示从文档的标识符的末尾到与查询项匹配的标识符项的距离。 相关系统产生具有与某个标识符匹配距离的文档与查询相关的先前相关概率。 相关系统使用先前的相关性概率来确定基于标识符匹配距离的文档与查询的相关性。

    Information classification paradigm
    9.
    发明授权
    Information classification paradigm 有权
    信息分类范式

    公开(公告)号:US07529748B2

    公开(公告)日:2009-05-05

    申请号:US11276818

    申请日:2006-03-15

    CPC classification number: G06F17/30707 Y10S707/99933 Y10S707/99937

    Abstract: A mechanism to classify source documents into one of two categories, either likely to contain desired information or unlikely to contain desired information. Generally some form of rules based classification in conjunction with deeper analysis using advanced techniques on difficult cases is utilized. The rules based classification is generally good for eliminating cases from further consideration and for identifying documents of interest based on generally discernable relationships between data or based on the presence or absence of data. The deeper analysis is used to uncover more complex relationships between data that may identify documents of interest. Portions of the process may use the entire document while other portions of the process may use only a portion of the document.

    Abstract translation: 将源文档分类为两个类别之一的机制,可能包含所需信息或不太可能包含所需信息。 通常使用某种形式的基于规则的分类,结合使用先进技术在困难案例上进行更深入的分析。 基于规则的分类通常对于消除进一步考虑的情况以及基于数据之间的一般可辨别的关系或基于数据的存在或不存在来识别感兴趣的文档是有益的。 更深入的分析用于发现可能识别感兴趣文档的数据之间更复杂的关系。 过程的一部分可以使用整个文档,而进程的其他部分可以仅使用文档的一部分。

    Retrieval of Structured Documents
    10.
    发明申请
    Retrieval of Structured Documents 有权
    结构化文件的检索

    公开(公告)号:US20090012956A1

    公开(公告)日:2009-01-08

    申请号:US12211793

    申请日:2008-09-16

    Abstract: This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.

    Abstract translation: 本公开涉及对包含多个结构化文档的数据库执行关于搜索项的查询。 在初始搜索期间,不包括搜索条件的结构化文档被转移或过滤掉。 通过基于每个单独元素与搜索项匹配的程度对各个元素进行排名来评估包含搜索词的那些结构化文档的匹配结构化文档,并向用户指示各个元素的排名,其中各个元素可以是 由用户访问

Patent Agency Ranking