System and method of automatic discovery of terms in a document that are relevant to a given target topic
    1.
    发明授权
    System and method of automatic discovery of terms in a document that are relevant to a given target topic 有权
    自动发现文档中与给定目标主题相关的术语的系统和方法

    公开(公告)号:US06651058B1

    公开(公告)日:2003-11-18

    申请号:US09439758

    申请日:1999-11-15

    Abstract: A computer program product is provided as an automatic mining system to discover terms that are relevant to a given target topic from a large databases of unstructured information such as the World Wide Web. The operation of the automatic mining system is performed in three stages: The first stage is carried out by a new terms discoverer for discovering the terms in a document, the second stage is carried out by a candidate terms discoverer for discovering potentially relevant terms, and the third stage is carried out by a relevant terms discoverer for refining or testing the discovered relevance to filter false relevance. The new terms discoverer includes a system for the automatic mining of patterns and relations, a system for the automatic mining of new relationships, and a system for selecting new terms from relations. In one embodiment, the system for the automatic mining of patterns and relations identifies a set of related terms on the WWW with a high degree of confidence, using a duality concept, and includes a terms database and two identifiers: a relation identifier and a pattern identifier. The system for the automatic mining of new relationships includes a database a knowledge module and a statistics module. The knowledge module includes a stemming unit, a synonym check unit, and a domain knowledge check unit. The candidate terms discoverer includes a metadata extractor, a document vector module, an association module, a filtering module, and a database. The relevant terms discoverer includes a stop word filter and a system for the automatic construction of generalization—specialization hierarchy of terms comprised of a terms database, an augmentation module, a generalization detection module, and a hierarchy database.

    Abstract translation: 提供计算机程序产品作为自动挖掘系统,以从诸如万维网的非结构化信息的大型数据库中发现与给定目标主题相关的术语。 自动采矿系统的运行分为三个阶段:第一阶段是由一个新的术语发现者进行的,用于发现文件中的术语,第二阶段是由候选词条发现者进行的,用于发现潜在的相关术语,以及 第三阶段由相关术语发现者进行,用于提炼或测试发现的相关性以过滤虚假相关性。 新术语发现者包括自动挖掘模式和关系的系统,用于自动挖掘新关系的系统以及从关系中选择新术语的系统。 在一个实施例中,用于自动挖掘模式和关系的系统使用二元性概念以高度的置信度在WWW上识别一组相关术语,并且包括术语数据库和两个标识符:关系标识符和模式 标识符 用于自动挖掘新关系的系统包括数据库知识模块和统计模块。 知识模块包括词干单元,同义词检查单元和域知识检查单元。 候选词语发现者包括元数据提取器,文档向量模块,关联模块,过滤模块和数据库。 相关术语发现者包括停止词过滤器和用于自动构建由术语数据库,增强模块,泛化检测模块和层次数据库组成的术语的泛化专业化层次结构的系统。

    System and method for the automatic mining of acronym-expansion pairs patterns and formation rules
    2.
    发明授权
    System and method for the automatic mining of acronym-expansion pairs patterns and formation rules 有权
    自动挖掘首字母缩略词扩展对模式和形成规则的系统和方法

    公开(公告)号:US06385629B1

    公开(公告)日:2002-05-07

    申请号:US09440625

    申请日:1999-11-15

    CPC classification number: G06F17/30539 Y10S707/99936

    Abstract: A computer program product is provided as an automatic mining system to identify a set of related information on the World Wide Web using the duality concept. The mining system addresses iteratively refines mutually dependent approximations to their identifications. Specifically, the mining system iteratively refines (i) pairs of phrases related in a specific way; (ii) the patterns of their occurrences in web pages; and (iii) the formation rules. In one embodiment, the automatic mining system identifies (acronym, expansion) pairs in terms of the patterns of their occurrences in the web pages and their formation rules. The automatic mining system includes a formation rule identifier that derives the formation rules, an acronym-expansion pair identifier that derives the (acronym, expansion) pairs, and a pattern identifier that derives the patterns. The database stores the (acronym, expansion) pairs, patterns, and formation rules. Initially, the database begins with small seed sets of (acronym, expansion) pairs, patterns, and formation rules that are continuously and iteratively broadened by the automatic mining system.

    Abstract translation: 提供计算机程序产品作为自动采矿系统,以使用二元性概念在万维网上识别一组相关信息。 采矿系统对其标识进行迭代地优化相互依赖的近似。 具体来说,采矿系统迭代地提炼(i)以特定方式相关的短语对; (ii)网页中出现的模式; 和(iii)形成规则。 在一个实施例中,自动采矿系统根据它们在网页中的发生模式及其形成规则来识别(首字母缩略词,扩展)对。 自动挖掘系统包括导出形成规则的形成规则标识符,导出(首字母缩略词,扩展)对的首字母缩略词 - 扩展对标识符以及导出模式的模式标识符。 数据库存储(首字母缩略词,扩展)对,模式和形成规则。 最初,数据库以自动采矿系统连续和迭代扩展的(缩写,扩展)对,模式和形成规则的小种子集开始。

    System and method for the automatic mining of new relationships
    3.
    发明授权
    System and method for the automatic mining of new relationships 失效
    自动挖掘新关系的系统和方法

    公开(公告)号:US06539376B1

    公开(公告)日:2003-03-25

    申请号:US09440626

    申请日:1999-11-15

    Abstract: An automatic mining system that identifies a set of relevant terms from a large text database of unstructured information, such as the World Wide Web with a high degree of confidence. The automatic mining system includes a software program that enables the discovery of new relationships by association mining and refinement of co-occurrences, using automatic and iterative recognition of new binary relations through phrases that embody related pairs, by applying lexicographic and statistical techniques to classify the relations, and further by applying a minimal amount of domain knowledge of the relevance of the terms and relations. The automatic mining system includes a knowledge module and a statistics module. The knowledge module is comprised of a stemming unit, a synonym check unit, and a domain knowledge check unit. The stemming unit determines if the relation being analyzed shares a common root with a previously mined relation. The synonym check unit identifies the synonyms of the relation, and the domain knowledge check unit considers extrinsic factors for indications that would further clarify the relationship being mined. The statistics module optimizes the confidence level in the relationship.

    Abstract translation: 一种自动采矿系统,用于从非结构化信息的大型文本数据库(例如万维网)中高度信任地识别一组相关术语。 自动采矿系统包括一个软件程序,可以通过关联挖掘和共同出现的细化来发现新的关系,通过应用词典和统计技术来分类新的二元关系,通过使用相关对的短语自动和迭代地识别新的二元关系 关系,并进一步通过应用最少量的领域知识的术语和关系的相关性。 自动采矿系统包括知识模块和统计模块。 知识模块由词干单元,同义词检查单元和域知识检查单元组成。 干扰单元确定正在分析的关系是否与先前开采的关系共享共同的根。 同义词检查单元识别关系的同义词,域知识检查单元考虑用于进一步澄清正在开采的关系的指示的外在因素。 统计模块优化关系中的置信水平。

    System and method for automatically and iteratively mining related terms in a document through relations and patterns of occurrences
    4.
    发明授权
    System and method for automatically and iteratively mining related terms in a document through relations and patterns of occurrences 失效
    通过关系和事件模式自动和迭代地挖掘文档中的相关术语的系统和方法

    公开(公告)号:US06505197B1

    公开(公告)日:2003-01-07

    申请号:US09439379

    申请日:1999-11-15

    Abstract: A computer program product is provided as an automatic mining system to identify a set of related terms on the World Wide Web that define a relationship, using the duality concept. Specifically, the mining system iteratively refines pairs of terms that are related in a specific way, and the patterns of their occurrences in web pages. The automatic mining system runs in an iterative fashion for continuously and incrementally refining the relates and their corresponding patterns. In one embodiment, the automatic mining system identifies relations in terms of the patterns of their occurrences in the web pages. The automatic mining system includes a relation identifier that derives new relations, and a pattern identifier that derives new patterns. The newly derived relations and patterns are stored in a database, which begins initially with small seed sets of relations and patterns that are continuously and iteratively broadened by the automatic mining system.

    Abstract translation: 提供计算机程序产品作为自动采矿系统,以使用二元性概念来识别在万维网上定义关系的一组相关术语。 具体来说,采矿系统迭代地优化与特定方式相关的术语对以及它们在网页中出现的模式。 自动采矿系统以迭代的方式运行,以连续和逐步地完善关联及其对应的模式。 在一个实施例中,自动采矿系统根据其在网页中的出现模式来识别关系。 自动挖掘系统包括导出新关系的关系标识符和导出新模式的模式标识符。 新派生的关系和模式存储在数据库中,该数据库最初以自动采矿系统连续和迭代地扩大的关系和模式的小种子集开始。

    System and method for the automatic recognition of relevant terms by mining link annotations
    5.
    发明授权
    System and method for the automatic recognition of relevant terms by mining link annotations 失效
    通过采矿链接注释自动识别相关术语的系统和方法

    公开(公告)号:US06651059B1

    公开(公告)日:2003-11-18

    申请号:US09440602

    申请日:1999-11-15

    Abstract: A computer program product is provided as an automatic mining system to identify a set of relevant terms from a large text database of unstructured information, such as the World Wide Web (WWW), with a high degree of confidence, by association mining and refinement of co-occurrences using hypertext link metadata. The automatic mining system includes a software package comprised of a metadata extractor, a document vector module, an association module, and a filtering module. The automatic mining system further includes a database for storing the mined sets of relevant terms. The automatic mining system scans the downloaded hypertext links, rather than the entire body of the documents for related information. As a result, the crawler is not required to provide a relatively lengthy download of the document content, and thus, the automatic mining system minimizes the download and processing time.

    Abstract translation: 提供计算机程序产品作为自动采矿系统,以高度的置信度通过关联挖掘和细化来识别来自诸如万维网(WWW)的非结构化信息的大型文本数据库中的一组相关术语 使用超文本链接元数据的共同事件。 自动挖掘系统包括由元数据提取器,文档向量模块,关联模块和过滤模块组成的软件包。 自动采矿系统还包括用于存储相关术语的开采组的数据库。 自动采矿系统扫描下载的超文本链接,而不是整个文档的相关信息。 因此,爬行器不需要提供文件内容相对漫长的下载,因此,自动采矿系统使下载和处理时间最小化。

    Method and system for classifying semi-structured documents
    6.
    发明授权
    Method and system for classifying semi-structured documents 失效
    半结构化文件分类方法和系统

    公开(公告)号:US06606620B1

    公开(公告)日:2003-08-12

    申请号:US09624616

    申请日:2000-07-24

    CPC classification number: G06F17/3061 Y10S707/99933 Y10S707/99945

    Abstract: A classifier for semi-structured documents and associated method dynamically and accurately classify documents with an implicit or explicit schema by taking advantage of the term-frequency and term distribution information inherent in the document. The system uses a structured vector model that allows like terms to be grouped together and dissimilar terms to be segregated based on their frequency and distribution within the sub-vectors of the structure vector, thus achieving context sensitivity. The final decision for assigning the class of a document is based on a mathematical comparison of the similarity of the terms in the structured vector to those of the various class models. The classifier of the present invention is capable of both learning and testing. In the learning phase the classifier develops models for classes with information it develops from the composite information gleaned from numerous training documents. Specifically, it develops a structured vector model for each training document. Then, within a given class of documents it adds and then normalizes the occurrences of terms.

    Abstract translation: 半结构化文档和相关方法的分类器通过利用文档中固有的术语频率和术语分布信息来动态和准确地对具有隐式或显式模式的文档进行分类。 该系统使用结构化向量模型,其允许将类似的术语分组在一起,并且基于其在结构向量的子向量内的频率和分布来分离不同的术语,从而实现上下文敏感性。 分配文档类的最终决定是基于结构向量中的术语与各种类模型的相似度的数学比较。 本发明的分类器能够进行学习和测试。 在学习阶段,分类器根据从许多培训文件中收集到的综合信息,开发出具有信息的课程模型。 具体来说,它为每个培训文档开发一个结构化的向量模型。 然后,在给定类别的文档中,它添加然后对条款的出现进行规范化。

    System and method for the automatic construction of generalization-specialization hierarchy of terms from a database of terms and associated meanings
    7.
    发明授权
    System and method for the automatic construction of generalization-specialization hierarchy of terms from a database of terms and associated meanings 失效
    从术语数据库和相关含义自动构建泛化专业化层次结构的系统和方法

    公开(公告)号:US06519602B2

    公开(公告)日:2003-02-11

    申请号:US09440203

    申请日:1999-11-15

    Abstract: A computer program product is provided as an automatic mining system to build a generalization hierarchy of terms from a database of terms and associated meanings, using for example the Least General Generalization (LGG) model. The automatic mining system is comprised of a terms database, an augmentation module, a generalization detection module, and a hierarchy database. The terms database stores the terms and their meanings, and the hierarchy database stores the generalization hierarchy which is defined by a set of edges and nodes. The augmentation module updates the terms using the LGG model. The generalization detection module maps the generalizations derived by the augmentation module, updates the edges, and derives a generalization hierarchy. In operation, the automatic mining system begins with no predefined taxonomy of the concept terms, and the LGG model derives a generalization hierarchy, modeled as a Directed Acyclic Graph from the terms.

    Abstract translation: 提供计算机程序产品作为自动挖掘系统,以使用例如最小通用泛化(LGG)模型从术语和相关意义的数据库建立术语的泛化层次。 自动挖掘系统由术语数据库,增强模块,泛化检测模块和层次数据库组成。 术语数据库存储术语及其含义,层级数据库存储由一组边缘和节点定义的泛化层次。 增强模块使用LGG模型更新术语。 泛化检测模块映射由增强模块导出的泛化,更新边缘,并导出泛化层次结构。 在操作中,自动挖掘系统从概念术语没有预先定义的分类开始,而LGG模型从这些术语中导出了一个泛化层次,被建模为一个非定常非循环图。

    Recommendations based on branding
    8.
    发明授权
    Recommendations based on branding 有权
    基于品牌推荐

    公开(公告)号:US09443209B2

    公开(公告)日:2016-09-13

    申请号:US12707618

    申请日:2010-02-17

    CPC classification number: G06Q30/0631 G06Q10/04 G06Q30/02

    Abstract: A method and a system for providing recommendations based on branding are disclosed. For example, a brand preference corresponding to a first brand and a first category may be identified based on user activity. A recommendation is provided to the user based on the brand preference. The recommendation may be provided based on a predetermined brand relationship comprising the first brand associated with the first category, a second brand associated with a second category, and a recommendation score between the first and second categories and brands. The recommendation may provided by accessing a relationships database to determine at least one brand relationship of the brand relationships corresponding to the brand preference.

    Abstract translation: 公开了一种基于品牌提供建议的方法和系统。 例如,可以基于用户活动来识别对应于第一品牌和第一类别的品牌偏好。 基于品牌偏好向用户提供建议。 可以基于包括与第一类别相关联的第一品牌,与第二类别相关联的第二品牌以及第一类别和第二类别和品牌之间的推荐评分的预定品牌关系来提供推荐。 该建议可以通过访问关系数据库来确定与品牌偏好相对应的品牌关系的至少一个品牌关系来提供。

    GARMENT SIZE MAPPING
    9.
    发明申请
    GARMENT SIZE MAPPING 审中-公开
    服装尺寸映射

    公开(公告)号:US20160092956A1

    公开(公告)日:2016-03-31

    申请号:US14503309

    申请日:2014-09-30

    CPC classification number: G06Q30/0621 G06Q30/0643

    Abstract: Techniques for mapping size information associated with a client to target brands, garments, sizes, shapes, and styles for which there is no standardized correlation. The size information associated with a client may be generated by modeling client garments, accessing computer aided drawing (CAD) files associated with client garments, or by analyzing a history of garment purchases associated with the client. Information for target garments may be generated in a similar fashion. A system may then create a standardized scale with a set of sizes for a target, and map a client base size to that standardized size scale. Similar matching and mapping may also be done with shape and style considerations. A recommendation based on the mapping may then be communicated to the client.

    Abstract translation: 用于映射与客户端相关联的大小信息的技术,以针对没有标准化相关性的品牌,服装,尺寸,形状和样式。 可以通过建模客户端服务,访问与客户端服务相关联的计算机辅助绘图(CAD)文件,或通过分析与客户端相关联的服装购买历史来生成与客户端相关联的大小信息。 目标服装的信息可以以类似的方式产生。 然后,系统可以创建具有用于目标的一组尺寸的标准化尺度,并将客户端基础尺寸映射到该标准化尺寸尺度。 也可以通过形状和样式考虑来完成类似的匹配和映射。 然后可以将基于映射的建议传达给客户端。

    MULTI-USER SEARCH OR RECOMMENDATION
    10.
    发明申请
    MULTI-USER SEARCH OR RECOMMENDATION 审中-公开
    多用户搜索或推荐

    公开(公告)号:US20160063012A1

    公开(公告)日:2016-03-03

    申请号:US14473934

    申请日:2014-08-29

    CPC classification number: G06F16/9537 G06F16/29 G06F16/33 G06F16/338

    Abstract: Disclosed are a system comprising a computer-readable storage medium storing at least one program, and a computer-implemented method for generating search results. An application interface module receives a first search request linked to first location data of a first user and a second search request linked to second location data of a second user. A search engine determines whether the first and second search requests satisfy a collaboration criterion based at least on the first and second location data. In accordance with a determination that the collaboration criterion is satisfied, the search engine generates a search result based on the first and second search requests. The application interface module provides graphical data for display of the search results within a user interface rendered on a user device.

    Abstract translation: 公开了一种包括存储至少一个程序的计算机可读存储介质和用于生成搜索结果的计算机实现的方法的系统。 应用接口模块接收链接到第一用户的第一位置数据的第一搜索请求和链接到第二用户的第二位置数据的第二搜索请求。 搜索引擎至少基于第一和第二位置数据确定第一和第二搜索请求是否满足协作标准。 根据协商标准的确定,搜索引擎基于第一和第二搜索请求生成搜索结果。 应用程序接口模块提供用于在用户设备上呈现的用户界面内显示搜索结果的图形数据。

Patent Agency Ranking