Method and system for schema matching of web databases
    92.
    发明申请
    Method and system for schema matching of web databases 有权
    Web数据库模式匹配的方法和系统

    公开(公告)号:US20050256850A1

    公开(公告)日:2005-11-17

    申请号:US10846396

    申请日:2004-05-14

    Abstract: A method and system for identifying schemas of web databases is provided. A schema matching system generates a mapping between an interface schema and a result schema of a web database, which is used to represent the underlying database schema. The schema matching system also generates a mapping of the interface attributes and the result attributes of the web database to global attributes of a global schema whose semantics are known. Using these mappings, a search engine service can formulate queries using the global attributes, map those queries to the corresponding interface attributes, submit the query, and retrieve the values from the result attributes that correspond to the desired global attributes.

    Abstract translation: 提供了一种用于识别Web数据库模式的方法和系统。 模式匹配系统生成Web数据库的接口模式和结果模式之间的映射,用于表示底层数据库模式。 模式匹配系统还会将Web数据库的接口属性和结果属性的映射生成为语义已知的全局模式的全局属性。 使用这些映射,搜索引擎服务可以使用全局属性来制定查询,将这些查询映射到相应的接口属性,提交查询,并从对应于所需全局属性的结果属性中检索值。

    Method and system for identifying image relatedness using link and page layout analysis
    93.
    发明申请
    Method and system for identifying image relatedness using link and page layout analysis 失效
    使用链接和页面布局分析识别图像相关性的方法和系统

    公开(公告)号:US20050246623A1

    公开(公告)日:2005-11-03

    申请号:US10834483

    申请日:2004-04-29

    CPC classification number: G06F17/30864 Y10S707/99931

    Abstract: A method and system for determining relatedness of images of pages based on link and page layout analysis. A link analysis system determines relatedness between images by first identifying blocks within web pages, and then analyzing the importance of the blocks to web pages, web pages to blocks, and images to blocks. Based on this analysis, the link analysis system determines the degree to which each image is related to each other image. The link analysis system may also use the relatedness of images to generate a ranking of the images. The link analysis system may also generate a vector representation of the images based on their relatedness and apply a clustering algorithm to the vector representations to identify clusters of related images.

    Abstract translation: 一种基于链接和页面布局分析来确定页面图像相关性的方法和系统。 链接分析系统通过首先识别网页内的块,然后分析块对网页,网页到块和图像到块的重要性来确定图像之间的相关性。 基于该分析,链路分析系统确定每个图像与彼此图像相关的程度。 链接分析系统还可以使用图像的相关性来生成图像的排序。 链接分析系统还可以基于它们的相关性生成图像的矢量表示,并将聚类算法应用于矢量表示以识别相关图像的簇。

    Webpage entity extraction through joint understanding of page structures and sentences
    94.
    发明授权
    Webpage entity extraction through joint understanding of page structures and sentences 有权
    网页实体提取通过联合理解页面结构和句子

    公开(公告)号:US09092424B2

    公开(公告)日:2015-07-28

    申请号:US12569912

    申请日:2009-09-30

    CPC classification number: G06F17/278

    Abstract: Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.

    Abstract translation: 描述了一种用于理解网页的实体的技术,例如标记网页上的实体。 迭代和双向框架处理网页,包括向结构理解组件(例如,扩展HCRF模型)提供文本分段特征的文本理解组件(例如,扩展Semi-CRF模型)。 结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构(例如,标记块)。 文本理解组件依次使用标记块来进一步理解文本。 该过程继续迭代直到满足相似性标准,此时实体可以被标记。 还描述了使用多个提及网页中的一组文本来帮助标注一个实体。

    Method and system for calculating importance of a block within a display page
    95.
    发明授权
    Method and system for calculating importance of a block within a display page 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US08095478B2

    公开(公告)日:2012-01-10

    申请号:US12101109

    申请日:2008-04-10

    CPC classification number: G06F17/30867 Y10S707/99933 Y10S707/99935

    Abstract: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    Abstract translation: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。

    AUTOMATED SOCIAL NETWORKING GRAPH MINING AND VISUALIZATION
    96.
    发明申请
    AUTOMATED SOCIAL NETWORKING GRAPH MINING AND VISUALIZATION 有权
    自动化社会网络采矿与可视化

    公开(公告)号:US20110283205A1

    公开(公告)日:2011-11-17

    申请号:US12780522

    申请日:2010-05-14

    CPC classification number: G06F17/30867

    Abstract: The automated social networking graph mining and visualization technique described herein mines social connections and allows creation of a social networking graph from general (not necessarily social-application specific) Web pages. The technique uses the distances between a person's/entity's name and related people's/entities names on one or more Web pages to determine connections between people/entities and the strengths of the connections. In one embodiment, the technique lays out these connections, and then clusters them, in a 2-D layout of a social networking graph that represents the Web connection strengths among the related people's or entities' names, by using a force-directed model.

    Abstract translation: 本文描述的自动化社交网络图挖掘和可视化技术挖掘社会关系,并允许从通用(不一定是社交应用专用)网页创建社交网络图。 该技术使用个人/实体的名称与一个或多个网页上的相关人员/实体名称之间的距离来确定人员/实体之间的连接以及连接的优势。 在一个实施例中,该技术设置了这些连接,然后通过使用力导向模型将它们聚类在代表相关人或实体名称中的Web连接强度的社交网络图的二维布局中。

    Data-Centric Search Engine Architecture
    97.
    发明申请
    Data-Centric Search Engine Architecture 审中-公开
    以数据为中心的搜索引擎架构

    公开(公告)号:US20110137886A1

    公开(公告)日:2011-06-09

    申请号:US12632821

    申请日:2009-12-08

    CPC classification number: G06F16/951

    Abstract: Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.

    Abstract translation: 描述了以数据为中心的网络搜索引擎技术/架构,其中包括离线提取的元数据的文档元数据被用作搜索索引和排序流水线的一部分。 Web数据管理组件接收爬取的文档并从文档中提取文档元数据。 索引组件使用文档元数据构建文档的索引。 服务组件使用索引和文档元数据来提供内容,例如搜索结果。 还描述了使用从查询日志的查询中提取的查询元数据用于流水线。

    SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT
    98.
    发明申请
    SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT 有权
    根据图像文本对文档的相关性进行分类

    公开(公告)号:US20110087660A1

    公开(公告)日:2011-04-14

    申请号:US12972259

    申请日:2010-12-17

    CPC classification number: G06F17/30864 G06F17/30265

    Abstract: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.

    Abstract translation: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。

    Content object indexing using domain knowledge
    99.
    发明授权
    Content object indexing using domain knowledge 有权
    使用领域知识的内容对象索引

    公开(公告)号:US07698294B2

    公开(公告)日:2010-04-13

    申请号:US11275509

    申请日:2006-01-11

    CPC classification number: G06F17/30613

    Abstract: A content object indexing process including creating a content object knowledge index, calculating a description vector of a target content object, and indexing the target content object by searching for the description vector in the content object knowledge database. It may be difficult to search for an exact content object such as a music file or academic researcher as a conventional search index may not include related hierarchical information. A content object indexing process may add hierarchical information taken from a content object knowledge index and incorporate the hierarchical information to the index entry for a specific content object. An application of such a content object indexing process may be a world wide web search engine.

    Abstract translation: 内容对象索引处理包括创建内容对象知识索引,计算目标内容对象的描述向量,并通过搜索内容对象知识库中的描述向量来索引目标内容对象。 可能难以搜索诸如音乐文件或学术研究者的确切内容对象,因为传统的搜索索引可能不包括相关的分层信息。 内容对象索引处理可以添加从内容对象知识索引获取的分层信息,并且将分层信息并入特定内容对象的索引条目。 这样的内容对象索引处理的应用可以是万维网搜索引擎。

    Semi-structured data storage schema selection
    100.
    发明授权
    Semi-structured data storage schema selection 失效
    半结构化数据存储模式选择

    公开(公告)号:US07668847B2

    公开(公告)日:2010-02-23

    申请号:US11267709

    申请日:2005-11-04

    Abstract: In one aspect, this disclosure relates to a method and associated apparatus that allows a user to obtain a semi-structured data input and a workload input. An improved semi-structured data storage schema is selected for a relational schema in response to the semi-structured data input and the workload input. The semi-structured data is segmented based on the selected improved semi-structured data storage schema. In one aspect, the semi-structured data is XML data.

    Abstract translation: 一方面,本公开涉及一种允许用户获得半结构化数据输入和工作负载输入的方法和相关联的装置。 响应于半结构化数据输入和工作负载输入,为关系模式选择了改进的半结构化数据存储模式。 基于所选改进的半结构化数据存储模式对半结构化数据进行分段。 在一个方面,半结构化数据是XML数据。

Patent Agency Ranking