Employing Topic Models for Semantic Class Mining
    81.
    发明申请
    Employing Topic Models for Semantic Class Mining 有权
    采用主题模型进行语义类挖掘

    公开(公告)号:US20120030206A1

    公开(公告)日:2012-02-02

    申请号:US12846064

    申请日:2010-07-29

    CPC classification number: G06F17/30864 G06F17/30707

    Abstract: A topic modeling architecture is used to discover high-quality semantic classes from a large collection of raw semantic classes (RASCs) for use in generating responses to queries. A specific semantic class is identified from a collection of RASCs, and a preprocessing operation is conducted to remove one or more items with a semantic class frequency less than a predetermined threshold. A topic model is then applied to the specific semantic class for each of the items that remain in the specific semantic class after the preprocessing operation. A postprocessing operation is then conducted on the items of the specific semantic class to merge and sort the results of the topic model and generate final semantic classes for use by a search engine to respond to a query.

    Abstract translation: 主题建模架构用于从用于生成对查询的响应的大量原始语义类(RASC)集合中发现高质量语义类。 从RASC的集合中识别特定语义类,并且执行预处理操作以去除具有小于预定阈值的语义类频率的一个或多个项。 然后,在预处理操作之后,将主题模型应用于保留在特定语义类中的每个项目的特定语义类。 然后对特定语义类的项目进行后处理操作,以合并和排序主题模型的结果,并生成最终语义类,供搜索引擎使用以响应查询。

    Scalable model-based product matching
    82.
    发明授权
    Scalable model-based product matching 有权
    可扩展的基于模型的产品匹配

    公开(公告)号:US07979459B2

    公开(公告)日:2011-07-12

    申请号:US11763539

    申请日:2007-06-15

    CPC classification number: G06Q30/02 G06Q10/101 Y10S707/944

    Abstract: Aspects of the subject matter described herein relate to matching product information to products. In aspects, a product matching component receives product information. The product matching component normalizes the product information and obtains keywords from the product information. By querying a database of recognized products, the keywords are used to obtain a list of products that potentially match the product information. A confidence level is assigned to each of the potential matches in the list. A match may be returned for the highest matched product or for a selectable number of products whose confidence level(s) exceed a selectable threshold.

    Abstract translation: 本文描述的主题的方面涉及将产品信息与产品相匹配。 在方面,产品匹配组件接收产品信息。 产品匹配组件对产品信息进行规范化,并从产品信息中获取关键字。 通过查询已识别产品的数据库,关键字用于获取可能与产品信息相匹配的产品列表。 信任级别分配给列表中的每个潜在匹配项。 可以为最高匹配产品或可信度等级超过可选阈值的产品的可选数量返回匹配。

    EXPERIMENTAL WEB SEARCH SYSTEM
    83.
    发明申请
    EXPERIMENTAL WEB SEARCH SYSTEM 审中-公开
    实验WEB搜索系统

    公开(公告)号:US20110078131A1

    公开(公告)日:2011-03-31

    申请号:US12569978

    申请日:2009-09-30

    CPC classification number: G06F16/951

    Abstract: Described is the running of search-related experiments on a full (or partial) offline snapshot copy of the search engine documents of an actual production system. A snapshot experimentation subsystem runs experimental code related to web searches on the offline data, including to run experimental index building code to build an experimental index (e.g., to test a new document feature), and/or to run experimental search-related code, such as to rank search results according to experimental ranking code, to implement an experimental search strategy, and/or to generate experimental captions.

    Abstract translation: 描述了对实际生产系统的搜索引擎文档的完整(或部分)离线快照副本的搜索相关实验的运行。 快照实验子系统运行与离线数据上的网络搜索相关的实验代码,包括运行实验索引构建代码来构建实验索引(例如,测试新文档特征)和/或运行实验搜索相关代码, 例如根据实验排名代码对搜索结果进行排名,以实现实验搜索策略,和/或生成实验标题。

    Automatic detection of online commercial intention
    84.
    发明授权
    Automatic detection of online commercial intention 失效
    自动检测在线商业意图

    公开(公告)号:US07831685B2

    公开(公告)日:2010-11-09

    申请号:US11300748

    申请日:2005-12-14

    CPC classification number: G06Q30/02

    Abstract: Features extracted from network browser pages and/or network search queries are leveraged to facilitate in detecting a user's browsing and/or searching intent. Machine learning classifiers constructed from these features automatically detect a user's online commercial intention (OCI). A user's intention can be commercial or non-commercial, with commercial intentions being informational or transactional. In one instance, an OCI ranking mechanism is employed with a search engine to facilitate in providing search results that are ranked according to a user's intention. This also provides a means to match purchasing advertisements with potential customers who are more than likely ready to make a purchase (transactional stage). Additionally, informational advertisements can be matched to users who are researching a potential purchase (informational stage).

    Abstract translation: 从网络浏览器页面和/或网络搜索查询中提取的特征被利用以便于检测用户的浏览和/或搜索意图。 从这些功能构建的机器学习分类器自动检测用户的在线商业意图(OCI)。 用户的意图可以是商业的或非商业的,商业意图是信息或交易的。 在一种情况下,使用OCI排名机制与搜索引擎,以便于提供根据用户意图进行排名的搜索结果。 这也提供了一种方法来将购买广告与潜在客户相匹配,潜在客户可能准备进行购买(交易阶段)。 此外,信息广告可以与正在研究潜在购买(信息阶段)的用户匹配。

    PSEUDO-ANCHOR TEXT EXTRACTION
    85.
    发明申请
    PSEUDO-ANCHOR TEXT EXTRACTION 有权
    PSEUDO-ANCHOR文本提取

    公开(公告)号:US20100145956A1

    公开(公告)日:2010-06-10

    申请号:US12697056

    申请日:2010-01-29

    CPC classification number: G06F17/30616 G06F17/30864 Y10S707/99932

    Abstract: A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help rank the objects in a search result to improve search performance. The method may be used in vertical search of objects such as published articles, products and images that lack explicit URLs and anchor text information.

    Abstract translation: 搜索方法使用与搜索对象相关联的伪锚文本来改善搜索性能。 伪锚文本可以与来自诸如文档集合的数字语料库的搜索对象(诸如伪URL)的标识符组合提取。 优选地,使用基于机器学习的方法从候选锚块中提取每个对象的伪锚文本。 伪锚文本可用于搜索,并用于帮助对搜索结果中的对象进行排名以提高搜索性能。 该方法可以用于垂直搜索诸如已发表的文章,产品和缺乏明确的URL和锚文本信息的图像的对象。

    Determining relevance of a document to a query based on spans of query terms
    86.
    发明授权
    Determining relevance of a document to a query based on spans of query terms 有权
    根据查询项的跨度确定文档与查询的相关性

    公开(公告)号:US07480652B2

    公开(公告)日:2009-01-20

    申请号:US11259621

    申请日:2005-10-26

    Abstract: A relevance system determines the relevance of a query term to a document based on spans within the document that contain the query term. The relevance system aggregates the relevance of the query terms into an overall relevance for the document. For each query term, the relevance system calculates a span relevance for each span that contains that query term. The relevance system then aggregates the span relevances for a query term into a query term relevance for that document. The relevance system may aggregate the query term relevances into a document relevance.

    Abstract translation: 相关系统基于包含查询项的文档中的跨度来确定查询项与文档的相关性。 相关系统将查询词的相关性聚合到文档的整体相关性。 对于每个查询项,相关系统计算包含该查询项的每个跨度的跨度相关性。 相关系统然后将查询项的跨度相关性聚合到该文档的查询词相关性中。 相关系统可以将查询词语相关性合并成文档相关性。

    SCALABLE MODEL-BASED PRODUCT MATCHING
    87.
    发明申请
    SCALABLE MODEL-BASED PRODUCT MATCHING 有权
    可扩展模型的产品匹配

    公开(公告)号:US20080313165A1

    公开(公告)日:2008-12-18

    申请号:US11763539

    申请日:2007-06-15

    CPC classification number: G06Q30/02 G06Q10/101 Y10S707/944

    Abstract: Aspects of the subject matter described herein relate to matching product information to products. In aspects, a product matching component receives product information. The product matching component normalizes the product information and obtains keywords from the product information. By querying a database of recognized products, the keywords are used to obtain a list of products that potentially match the product information. A confidence level is assigned to each of the potential matches in the list. A match may be returned for the highest matched product or for a selectable number of products whose confidence level(s) exceed a selectable threshold.

    Abstract translation: 本文描述的主题的方面涉及将产品信息与产品相匹配。 在方面,产品匹配组件接收产品信息。 产品匹配组件对产品信息进行规范化,并从产品信息中获取关键字。 通过查询已识别产品的数据库,关键字用于获取可能与产品信息相匹配的产品列表。 信任级别分配给列表中的每个潜在匹配项。 可以为最高匹配产品或可信度等级超过可选阈值的产品的可选数量返回匹配。

    Retrieval of structured documents
    88.
    发明授权
    Retrieval of structured documents 有权
    检索结构化文件

    公开(公告)号:US07428538B2

    公开(公告)日:2008-09-23

    申请号:US11277344

    申请日:2006-03-23

    Abstract: This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.

    Abstract translation: 本公开涉及对包含多个结构化文档的数据库执行关于搜索项的查询。 在初始搜索期间,不包括搜索条件的结构化文档被转移或过滤掉。 通过基于每个单独元素与搜索项匹配的程度对各个元素进行排名来评估包含搜索词的那些结构化文档的匹配结构化文档,并向用户指示各个元素的排名,其中各个元素可以是 由用户访问

    Method and system for identifying image relatedness using link and page layout analysis
    89.
    发明授权
    Method and system for identifying image relatedness using link and page layout analysis 失效
    使用链接和页面布局分析识别图像相关性的方法和系统

    公开(公告)号:US07293007B2

    公开(公告)日:2007-11-06

    申请号:US10834483

    申请日:2004-04-29

    CPC classification number: G06F17/30864 Y10S707/99931

    Abstract: A method and system for determining relatedness of images of pages based on link and page layout analysis. A link analysis system determines relatedness between images by first identifying blocks within web pages, and then analyzing the importance of the blocks to web pages, web pages to blocks, and images to blocks. Based on this analysis, the link analysis system determines the degree to which each image is related to each other image. The link analysis system may also use the relatedness of images to generate a ranking of the images. The link analysis system may also generate a vector representation of the images based on their relatedness and apply a clustering algorithm to the vector representations to identify clusters of related images.

    Abstract translation: 一种基于链接和页面布局分析来确定页面图像相关性的方法和系统。 链接分析系统通过首先识别网页内的块,然后分析块对网页,网页到块和图像到块的重要性来确定图像之间的相关性。 基于该分析,链路分析系统确定每个图像与彼此图像相关的程度。 链接分析系统还可以使用图像的相关性来生成图像的排序。 链接分析系统还可以基于它们的相关性生成图像的矢量表示,并将聚类算法应用于矢量表示以识别相关图像的簇。

    Retrieval of structured documents
    90.
    发明授权
    Retrieval of structured documents 有权
    检索结构化文件

    公开(公告)号:US07111000B2

    公开(公告)日:2006-09-19

    申请号:US10337138

    申请日:2003-01-06

    Abstract: This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.

    Abstract translation: 本公开涉及对包含多个结构化文档的数据库执行关于搜索项的查询。 在初始搜索期间,不包括搜索条件的结构化文档被转移或过滤掉。 通过基于每个单独元素与搜索项匹配的程度对各个元素进行排名来评估包含搜索词的那些结构化文档的匹配结构化文档,并向用户指示各个元素的排名,其中各个元素可以是 由用户访问

Patent Agency Ranking