LINGUISTICALLY-ADAPTED STRUCTURAL QUERY ANNOTATION

    公开(公告)号:US20130080152A1

    公开(公告)日:2013-03-28

    申请号:US13245147

    申请日:2011-09-26

    Abstract: A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.

    Abstract translation: 提供了一种用于查询的自然语言处理的系统和方法。 词典包括文本元素,当大写时被认为是专有名词。 自然语言查询包括包括单词的文本元素序列。 查询被处理。 处理包括预处理步骤,其中部分语音特征被分配给查询中的文本元素。 这包括从词典中识别出以小写字母开头的查询中的文本元素,并根据词典将查询中的文本元素分配资本重组信息。 该信息包括文本元素的大写形式的一部分词性特征。 然后,查询中的文本元素的部分语义被消歧,其中包括基于资本重组信息应用用于资本化文本元素的规则。

    Mining transliterations for out-of-vocabulary query terms
    72.
    发明授权
    Mining transliterations for out-of-vocabulary query terms 有权
    用于外部词汇查询术语的挖掘音译

    公开(公告)号:US08332205B2

    公开(公告)日:2012-12-11

    申请号:US12350981

    申请日:2009-01-09

    CPC classification number: G06F17/30669 G06F17/2223 G06F17/30666

    Abstract: An approach is described for using a query expressed in a source language to retrieve information expressed in a target language. The approach uses a translation dictionary to convert terms in the query from the source language to appropriate terms in the target language. The approach determines viable transliterations for out-of-vocabulary (OOV) query terms by retrieving a body of information based on an in-vocabulary component of the query, and then mining the body of information to identify the viable transliterations for the OOV query terms. The approach then adds the viable transliterations to the translation dictionary. The retrieval, mining, and adding operations can be repeated one or more or times.

    Abstract translation: 描述了使用以源语言表示的查询来检索以目标语言表达的信息的方法。 该方法使用翻译字典将查询中的术语从源语言转换为目标语言的适当术语。 该方法通过基于查询的词汇组成部分检索一组信息,然后挖掘信息主体来识别OOV查询词语的可行音译,从而确定词汇量(OOV)查询词语的可行音译 。 然后,该方法将可行的音译添加到翻译字典中。 检索,挖掘和添加操作可以重复一次或多次。

    Processor for fast contextual matching
    73.
    发明授权
    Processor for fast contextual matching 有权
    处理器用于快速上下文匹配

    公开(公告)号:US08135717B2

    公开(公告)日:2012-03-13

    申请号:US12414581

    申请日:2009-03-30

    CPC classification number: G06F17/30666 G06F17/30622 Y10S707/99933

    Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.

    Abstract translation: 使用被布置为执行查询的数据处理器来找到在文档语料库中具有选择特征的词。 存储器存储索引结构,其中索引结构中的条目将具有所选特征的单词和标记映射到语料库中的文档内的位置。 索引结构中的条目表示单词,其他条目表示具有标记词的位置信息的标记。 标记的条目可以是标记与各个标记的词的前缀或相邻的令牌。 查询处理器通过向查询添加单词的标记来形成修改的查询。 处理器执行修改后的查询。

    System and method for biasing search results based on topic familiarity
    74.
    发明授权
    System and method for biasing search results based on topic familiarity 有权
    基于主题熟悉度偏好搜索结果的系统和方法

    公开(公告)号:US08095487B2

    公开(公告)日:2012-01-10

    申请号:US11378871

    申请日:2006-03-16

    CPC classification number: G06F17/30864 G06F17/30666

    Abstract: A familiarity level classifier comprises a stopwords engine for conducting a stopwords analysis of stopwords, e.g., introductory level stopwords and advanced level stopwords, in a document, e.g., a website; and a familiarity level classifier module for generating a document familiarity level based on the stopwords analysis. The classifier may be in an indexing module, a search engine, a user computer, or elsewhere in a computer network. The classifier may also include a reading level engine for conducting a reading level analysis of the document, and wherein the familiarity level classifier module is configured to generate the familiarity level also based on the reading level analysis. The classifier may also include a document features engine for conducting a feature analysis of the document, and wherein the familiarity level classifier module is configured to generate the document familiarity level also based on the feature analysis.

    Abstract translation: 熟悉度级别分类器包括用于在文档(例如网站)中进行无障碍词语分析的停用词引擎,例如介绍级别禁用词和高级级别的禁用词; 以及一个熟悉级别的分类器模块,用于基于该词语分析生成文档熟悉程度。 分类器可以在索引模块,搜索引擎,用户计算机或计算机网络中的其它地方。 分类器还可以包括读取级别引擎,用于对文档进行读取级别分析,并且其中熟悉度级别分类器模块被配置为也基于读取级别分析生成熟悉度级别。 分类器还可以包括用于进行文档的特征分析的文档特征引擎,并且其中熟悉级别分类器模块被配置为也基于特征分析生成文档熟悉度级别。

    TRACKING SIGNIFICANT TOPICS OF DISCOURSE IN FORUMS
    75.
    发明申请
    TRACKING SIGNIFICANT TOPICS OF DISCOURSE IN FORUMS 有权
    跟踪论证中的重要主题

    公开(公告)号:US20100169327A1

    公开(公告)日:2010-07-01

    申请号:US12347473

    申请日:2008-12-31

    Abstract: Users in public forums often mention certain topics in the course of their discussions. Member's comments in messages to other members are analyzed to obtain terms that co-occur with topics. Frequencies of co-occurrence of a term with topics are normalized based on frequency of the term in a random sample of message. The terms are ranked by their normalized frequency of co-occurrence with a topic in messages. The top terms are selected based on their rank. Analysis of demographic information associated with members that mentioned top terms associated with a topic is displayed in graphical format that highlights the relationship between the age, gender, and usage of the top terms over time. The demographic information presented includes average age of members that mentioned a top term or their gender information within a selected time interval.

    Abstract translation: 公众论坛的用户在讨论过程中经常提到某些话题。 会员对其他成员的消息中的评论进行分析,以获得与主题共存的条款。 根据消息的随机抽样中的术语频率,对具有主题的术语共现的频率进行归一化。 这些术语按消息中与主题的共同出现的归一化频率进行排序。 顶级条款是根据他们的等级进行选择的。 与提及与主题相关的顶级术语的成员相关联的人口统计信息的分析以图形格式显示,突出显示年龄,性别和时间上的顶级术语的使用之间的关系。 所提供的人口信息包括在所选时间段内提及最高期的成员的平均年龄或性别信息。

    Cap-sensitive text search for documents
    76.
    发明授权
    Cap-sensitive text search for documents 有权
    Cap-sensitive文本搜索文档

    公开(公告)号:US07730062B2

    公开(公告)日:2010-06-01

    申请号:US11755424

    申请日:2007-05-30

    Applicant: Bryn Dole

    Inventor: Bryn Dole

    CPC classification number: G06F17/30666

    Abstract: Enabling text searching that accommodates a search criteria corresponding to a capitalization characteristic. One or more search terms are received, and a determination is made as to a capitalization characteristic of at least one search term. One or more documents are identified from a collection of documents. The identification is based at least in part on the determination of the capitalization characterization of the search term, so that the search result satisfies the criteria of the capitalization characteristic.

    Abstract translation: 启用适应与大写字母特征对应的搜索条件的文本搜索。 接收一个或多个搜索项,并且确定至少一个搜索项的大小写特性。 从文档集合中识别一个或多个文档。 识别至少部分地基于确定搜索词的大小写表征,使得搜索结果满足资本化特征的标准。

    LEMMATIZING, STEMMING, AND QUERY EXPANSION METHOD AND SYSTEM
    77.
    发明申请
    LEMMATIZING, STEMMING, AND QUERY EXPANSION METHOD AND SYSTEM 有权
    升华,STEMMING和QUERY扩展方法和系统

    公开(公告)号:US20100082333A1

    公开(公告)日:2010-04-01

    申请号:US12476238

    申请日:2009-06-01

    CPC classification number: G06F17/2755 G06F17/2735 G06F17/30666 G06F17/30672

    Abstract: A method of stemming text and system therefore are described. The method comprises removing stop words from a document based on at least one stop word entry in an array of stop words and flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by an verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns; applying a root-based stemming on the flagged verbs; and storing the stemmed document.

    Abstract translation: 因此描述了产生文本和系统的方法。 该方法包括基于停止词阵列中的至少一个停止词条目从文档中移除停止词,并将其标记为确定为附加到定义文章上的名词,并且在前面的停止词前面的名词列表条目之前 最少一个名词 将标记名词添加到名词字典; 标记为确定为在至少一个动词之前的停止词数组中的动词数组条目前面的动词; 将标记的动词添加到动词词典中; 根据标记的名词和标记的动词,搜索文献中的名词和动词; 在搜索文档之后移除剩余的停止词; 将光照射在被标记的名词上; 在标记的动词上应用基于根的词根; 并存储被干扰的文档。

    Method and system for responding to requests relating to complex data maintained in a structured form
    78.
    发明授权
    Method and system for responding to requests relating to complex data maintained in a structured form 有权
    用于响应以结构化形式维护的复杂数据的请求的方法和系统

    公开(公告)号:US07676519B2

    公开(公告)日:2010-03-09

    申请号:US11508023

    申请日:2006-08-22

    Abstract: A method and apparatus for processing user entered input and providing a response in a system for autonomously processing requests includes rules. For each rule, whether the input is recognized is determined. If it is, a response is sent to the user. To determine recognized input, the method attempts to match the rule to a pattern. If a match is not found, the input is not recognized. If a match is found, the input is recognized and the response is sent. Alternatively, the input is conditionally recognized and a statement validator is executed which queries structured data to determine if a logic statement evaluates to true. Depending on how the statement evaluates: i) the input is recognized and the response is sent, ii) the structured data is queried again for the next statement validator, or iii) the input is not recognized and the method continues to the next rule.

    Abstract translation: 用于处理用户输入的输入并在系统中提供用于自主处理请求的响应的方法和装置包括规则。 对于每个规则,确定输入是否被识别。 如果是,则向用户发送响应。 要确定已识别的输入,该方法会尝试将规则与模式相匹配。 如果找不到匹配项,则无法识别输入。 如果找到匹配,则识别输入并发送响应。 或者,有条件地识别输入,并执行语句验证器,查询结构化数据以确定逻辑语句是否计算为真。 取决于语句如何评估:i)识别输入并发送响应,ii)对下一个语句确认器再次查询结构化数据,或iii)输入不被识别,并且该方法继续下一个规则。

    Processor for fast phrase searching
    79.
    发明授权
    Processor for fast phrase searching 有权
    处理器用于快速搜索

    公开(公告)号:US07512596B2

    公开(公告)日:2009-03-31

    申请号:US11391889

    申请日:2006-03-29

    CPC classification number: G06F17/30666 G06F17/30622 Y10S707/99933

    Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.

    Abstract translation: 使用安排执行短语查询的数据处理器可以找到包含词性的文档语料库中的短语。 内存存储将索引结构中的条目映射到语料库中的文档的索引结构。 索引结构中的条目表示单词,并且其他条目表示在语料库中找到的与词语相邻的各个相邻单词的前缀合并的词条。 前缀包括各个相邻单词的一个或多个主要字符。 查询处理器通过用表示使用查询中的下一个单词的前缀合并的停止词的搜索令牌来代替停止词来形成修改的查询。 处理器执行修改后的查询。 此外,创建和维护包括合并的停用词的索引结构。

    System and Method for Improved Name Matching Using Regularized Name Forms
    80.
    发明申请
    System and Method for Improved Name Matching Using Regularized Name Forms 失效
    使用正则化名称表单改进名称匹配的系统和方法

    公开(公告)号:US20080215562A1

    公开(公告)日:2008-09-04

    申请号:US11681333

    申请日:2007-03-02

    CPC classification number: G06F17/30666 G06F17/30669 Y10S707/99933

    Abstract: A system and method for improved name matching using regularized name forms is presented. A regularization rule engine uses culture-specific regularization rules to iteratively convert candidate names and query names to a canonical form, which are regularized candidate names and regularized query names, respectively. The regularization rules are context-sensitive or context-free rules that pertain to a name's originating culture. Subsequently, a name search engine compares the regularized query name with the regularized candidate names and identifies the regularized candidate names that meet a particular regularization matching threshold. In turn, name search engine selects the candidate names that correspond to the identified regularized candidate names and provides the selected candidate names to a user.

    Abstract translation: 介绍了使用正则化名称形式改进名称匹配的系统和方法。 正则化规则引擎使用文化特定的规则化规则来将候选名称和查询名称迭代地转换为规范形式,分别是正则化候选名称和正则化查询名称。 正则化规则是与名称的原始文化相关的上下文相关或上下文无关的规则。 随后,名称搜索引擎将正则化查询名称与正则化候选名称进行比较,并识别满足特定正则化匹配阈值的正则化候选名称。 依次,名称搜索引擎选择与所识别的正则化候选名称相对应的候选名称,并向用户提供所选择的候选名称。

Patent Agency Ranking