Entity-specific tuned searching
    22.
    发明授权
    Entity-specific tuned searching 有权
    实体特定调谐搜索

    公开(公告)号:US07739270B2

    公开(公告)日:2010-06-15

    申请号:US11005989

    申请日:2004-12-07

    CPC classification number: G06F17/30867

    Abstract: The present invention leverages relevance data to provide enhanced search query results based on relevancy to a specific entity via an entity-specific tunable search. This allows an entity to retrieve information that is of more value to that entity, in a faster and more efficient manner. The entity itself can be an individual user, a grouping of users, and/or an enterprise and the like. In one instance of the present invention, entity-specific relevance information is determined via employment of similarity of the entity to another entity or group of entities. Interest levels and/or satisfaction levels of similar entities can also be utilized along with similarity information to facilitate in deriving the relevance information.

    Abstract translation: 本发明利用相关性数据,以通过特定于实体的可调搜索来提供基于与特定实体的相关性的增强的搜索查询结果。 这允许实体以更快更有效的方式检索对该实体更有价值的信息。 实体本身可以是单个用户,用户组和/或企业等。 在本发明的一个实例中,通过使用实体与另一实体或实体组的相似度来确定实体相关信息。 类似实体的兴趣水平和/或满意度水平也可以与相似性信息一起使用以促进相关性信息的导出。

    Associating information with an electronic document
    23.
    发明授权
    Associating information with an electronic document 有权
    将信息与电子文档相关联

    公开(公告)号:US07734631B2

    公开(公告)日:2010-06-08

    申请号:US11227937

    申请日:2005-09-15

    CPC classification number: G06F17/30899

    Abstract: A system for associating information comprises an association module that uses anchoring information to associate a first piece of information with a second piece of information, wherein the second piece of information is not part of the first piece of information. The system further includes a rendering module that presents the second piece of information for use. Methods for using such a system are also described.

    Abstract translation: 用于关联信息的系统包括使用锚定信息将第一条信息与第二条信息相关联的关联模块,其中第二条信息不是第一条信息的一部分。 该系统还包括呈现模块,其呈现第二条信息供使用。 还描述了使用这种系统的方法。

    Systems and methods for client-based web crawling
    24.
    发明授权
    Systems and methods for client-based web crawling 失效
    用于基于客户端的网络爬网的系统和方法

    公开(公告)号:US07685296B2

    公开(公告)日:2010-03-23

    申请号:US10670681

    申请日:2003-09-25

    CPC classification number: G06F17/30864

    Abstract: The present invention provides systems and methods for obtaining information from a networked system utilizing a distributed web crawler. The distributed nature of clients of a server is leveraged to provide fast and accurate web crawling data. Information gathered by a server's web crawler is compared to data retrieved by clients of the server to update the crawler's data. In one instance of the present invention, data comparison is achieved by utilizing information disseminated via a search engine results page. In another instance of the present invention, data validation is accomplished by client dictionaries, emanating from a server, that summarize web crawler data. The present invention also facilitates data analysis by providing a means to resist spoofing of a web crawler to increase data accuracy.

    Abstract translation: 本发明提供了利用分布式网络爬虫从网络系统获取信息的系统和方法。 服务器的客户端的分布式性质被用来提供快速准确的网页爬网数据。 将服务器的网页抓取工具收集的信息与服务器客户端检索的数据进行比较,以更新爬网程序的数据。 在本发明的一个实例中,通过利用通过搜索引擎结果页面传播的信息来实现数据比较。 在本发明的另一个实例中,数据验证是通过从服务器发出的客户词典来实现的,其总结了网络爬虫数据。 本发明还通过提供抵抗网络爬虫的欺骗以提高数据准确性的手段来促进数据分析。

    DOCUMENT RANKING UTILIZING PARAMETER VARYING DATA
    25.
    发明申请
    DOCUMENT RANKING UTILIZING PARAMETER VARYING DATA 审中-公开
    文件排名使用参数变化数据

    公开(公告)号:US20080104049A1

    公开(公告)日:2008-05-01

    申请号:US11552642

    申请日:2006-10-25

    CPC classification number: G06F16/951

    Abstract: The relevancy of search results are improved by exploiting changes in data related to information access. Parameter varying aspects of parameter varying data associated with document access are leveraged to provide enhanced ranking of document. As an aspect of the parameter varies, a rank can be accomplished, producing multiple ranks for a given set of parameter varying data. Parameters such as time, user preferences, popularity, and/or user demographics and the like can be utilized as parameter varying data. Thus, in general, single or multiple varying aspects of the parameters can be employed to produce a set of ranks comprising one or more rankings of document. This technique can be employed with static rankers, dynamic rankers, and/or ranker training data and the like to produce higher relevancy search results, increasing user satisfaction.

    Abstract translation: 通过利用与信息访问相关的数据的变化来提高搜索结果的相关性。 与文档访问相关的参数变化数据的参数变化方面被用来提供文档的增强排名。 由于参数的一个方面不同,可以实现一个等级,为给定的一组参数变化数据产生多个等级。 诸如时间,用户偏好,流行度和/或用户人口统计等的参数可以用作参数变化数据。 因此,通常,可以采用参数的单个或多个变化方面来产生包括文档的一个或多个排名的一组等级。 该技术可以与静态排名者,动态排名者和/或跑步者训练数据等一起使用,以产生更高的相关性搜索结果,增加用户满意度。

    Method and apparatus for unsupervised training of natural language processing units
    26.
    发明授权
    Method and apparatus for unsupervised training of natural language processing units 有权
    自然语言处理单元无人训练的方法和装置

    公开(公告)号:US07233892B2

    公开(公告)日:2007-06-19

    申请号:US11204213

    申请日:2005-08-15

    CPC classification number: G06F17/274

    Abstract: A method of training a natural language processing unit applies a candidate learning set to at least one component of the natural language unit. The natural language unit is then used to generate a meaning set from a first corpus. A second meaning set is generated from a second corpus using a second natural language unit and the two meaning sets are compared to each other to form a score for the candidate learning set. This score is used to determine whether to modify the natural language unit based on the candidate learning set.

    Abstract translation: 训练自然语言处理单元的方法将候选学习集合应用于自然语言单元的至少一个分量。 然后,自然语言单元用于从第一语料库生成意义集。 使用第二自然语言单元从第二语料库生成第二含义集合,并且将两个含义集合彼此进行比较以形成候选学习集合的分数。 该分数用于确定是否基于候选学习集修改自然语言单元。

    User intent discovery
    27.
    发明授权
    User intent discovery 有权
    用户意图发现

    公开(公告)号:US07158966B2

    公开(公告)日:2007-01-02

    申请号:US10796378

    申请日:2004-03-09

    Abstract: a system 100 that facilitates determining a user's intent given a user search query comprises a search engine that is employed to search over a collection of objects within a data store to retrieve a user search result set. The objects within the result set are associated with queries that were previously utilized to locate such objects. A level of relatedness between the previous queries and the user search query is determined, and previous queries that are associated with a result set that is novel and related to the user search result set are returned to the user.

    Abstract translation: 有助于确定给定用户搜索查询的用户意图的系统100包括用于搜索数据存储中的对象集合以检索用户搜索结果集的搜索引擎。 结果集中的对象与以前用于定位此类对象的查询相关联。 确定先前查询和用户搜索查询之间的相关性水平,并且将与结果集相关联且与用户搜索结果集相关联的先前查询返回给用户。

    Utilizing information redundancy to improve text searches

    公开(公告)号:US07152057B2

    公开(公告)日:2006-12-19

    申请号:US11336360

    申请日:2006-01-20

    CPC classification number: G06F17/3069 Y10S707/99932 Y10S707/99935

    Abstract: Architecture for improving text searches using information redundancy. A search component is coupled with an analysis component to rerank documents returned in a search according to a redundancy values. Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled with a projection component to project answer redundancy from one document search to another. This includes obtaining the benefit of considerable answer redundancy from a second data source by projecting the success of the search of the second data source against a first data source.

    Automated error checking system and method
    29.
    发明授权
    Automated error checking system and method 有权
    自动错误检查系统和方法

    公开(公告)号:US07113950B2

    公开(公告)日:2006-09-26

    申请号:US10183214

    申请日:2002-06-27

    Abstract: The present invention relates to a system and methodology to facilitate automated error correction of user input data via an analysis of the input data in accordance with an automatically generated and filtered database of processed structural groupings or formulations selected and filtered from past user activities. The filtered database provides a relevant foundation of potential phrases, topics, symbols, speech and/or colloquial structures of interest to users—which are automatically determined from previous user activity, and employed to facilitate automated error checking in accordance with the user's current input, command and/or request for information.

    Abstract translation: 本发明涉及一种通过根据经过处理的结构分组或从过去的用户活动中选择和过滤的公式的自动生成和过滤的数据库对输入数据的分析来促进用户输入数据的自动纠错的系统和方法。 经筛选的数据库提供了用户感兴趣的潜在短语,主题,符号,语音和/或口语结构的相关基础,这些结构根据以前的用户活动自动确定,并用于根据用户当前输入进行自动错误检查, 命令和/或请求信息。

    Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction
    30.
    发明授权
    Spell checker with arbitrary length string-to-string transformations to improve noisy channel spelling correction 失效
    具有任意长度的字符串到字符串转换的拼写检查器,以改善噪声通道拼写校正

    公开(公告)号:US07047493B1

    公开(公告)日:2006-05-16

    申请号:US09539357

    申请日:2000-03-31

    CPC classification number: G06F17/273 G10L15/183

    Abstract: A spell checker based on the noisy channel model has a source model and an error model. The source model determines how likely a word w in a dictionary is to have been generated. The error model determines how likely the word w was to have been incorrectly entered as the string s (e.g., mistyped or incorrectly interpreted by a speech recognition system) according to the probabilities of string-to-string edits. The string-to-string edits allow conversion of one arbitrary length character sequence to another arbitrary length character sequence.

    Abstract translation: 基于噪声通道模型的拼写检查器具有源模型和误差模型。 源模型确定字典中字w的生成可能性。 错误模型根据字符串到字符串编辑的概率确定字w被错误地输入为字符串s(例如,由语音识别系统错误地或不正确地解释)的可能性。 字符串到字符串的编辑允许将一个任意长度的字符序列转换为另一个任意长度的字符序列。

Patent Agency Ranking