Patent search cpc:"G06F17/30666" Page 6

51.

发明授权
Automatic stop word identification and compensation 有权
Title translation: 自动停止字识别和补偿

公开(公告)号：US07720792B2

公开(公告)日：2010-05-18

申请号：US11348303

申请日：2006-02-07

Applicant: Robert Jenson Price

Inventor： Robert Jenson Price

IPC: G06F7/00 , G06F17/00 , G06F17/30

CPC classification number: G06F17/30666 , G06F17/30616

Abstract: Disclosed are methods and computer program products for automatically identifying and compensating for stop words in a text processing system. This automatic stop word compensation allows such operations as performing queries on an abstract mathematical space built using all words from all texts, with the ability to compensate for the skew that the inclusion of the stop words may have introduced into the space. Documents are represented by document vectors in the abstract mathematical space. To compensate for stop words, a weight function is applied to a predetermined component of the document vectors associated with frequently occurring word(s) contained in the documents. The weight function may be applied dynamically during query processing. Alternatively, the weight function may be applied statically to all document vectors.

Abstract translation: 公开了用于在文本处理系统中自动识别和补偿停止词的方法和计算机程序产品。这种自动停止词补偿允许这样的操作，例如对使用来自所有文本的所有单词构建的抽象数学空间执行查询，并且能够补偿包含停止词可能已经被引入空间的偏差。文档由抽象数学空间中的文档向量表示。为了补偿停止词，将权重函数应用于与文档中包含的经常出现的单词相关联的文档向量的预定分量。权重函数可以在查询处理期间动态应用。或者，权重函数可以静态地应用于所有文档向量。

52.

发明授权
System and method for improved name matching using regularized name forms 失效
Title translation: 使用正则化名称形式改进名称匹配的系统和方法

公开(公告)号：US07599921B2

公开(公告)日：2009-10-06

申请号：US11681333

申请日：2007-03-02

Applicant: David Edward Biesenbach , Richard Theodore Gillam , Frankie Elizabeth Patman Maguire , Leonard Arthur Shaefer, Jr. , Charles Kinston Williams

Inventor： David Edward Biesenbach , Richard Theodore Gillam , Frankie Elizabeth Patman Maguire , Leonard Arthur Shaefer, Jr. , Charles Kinston Williams

IPC: G06F17/30

CPC classification number: G06F17/30666 , G06F17/30669 , Y10S707/99933

Abstract: A system and method for improved name matching using regularized name forms is presented. A regularization rule engine uses culture-specific regularization rules to iteratively convert candidate names and query names to a canonical form, which are regularized candidate names and regularized query names, respectively. The regularization rules are context-sensitive or context-free rules that pertain to a name's originating culture. Subsequently, a name search engine compares the regularized query name with the regularized candidate names and identifies the regularized candidate names that meet a particular regularization matching threshold. In turn, name search engine selects the candidate names that correspond to the identified regularized candidate names and provides the selected candidate names to a user.

Abstract translation: 介绍了使用正则化名称形式改进名称匹配的系统和方法。正则化规则引擎使用文化特定的规则化规则来将候选名称和查询名称迭代地转换为规范形式，分别是正则化候选名称和正则化查询名称。正则化规则是与名称的原始文化相关的上下文相关或上下文无关的规则。随后，名称搜索引擎将正则化查询名称与正则化候选名称进行比较，并识别满足特定正则化匹配阈值的正则化候选名称。依次，名称搜索引擎选择与所识别的正则化候选名称相对应的候选名称，并向用户提供所选择的候选名称。

53.

发明授权
Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems 有权
Title translation: 在基于关键字的检索系统中找到有意义的词汇或停止词组

公开(公告)号：US07409383B1

公开(公告)日：2008-08-05

申请号：US10813590

申请日：2004-03-31

Applicant: Simon Tong , Uri Lerner , Amit Singhal , Paul Haahr , Steven Baker

Inventor： Simon Tong , Uri Lerner , Amit Singhal , Paul Haahr , Steven Baker

IPC: G06F17/30 , G06F7/00 , G06F17/21

CPC classification number: G06F17/30967 , G06F7/24 , G06F17/21 , G06F17/30666 , G06F17/30979 , Y10S707/99933 , Y10S707/99943

Abstract: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

Abstract translation: 停止词检测组件在输入到基于关键字的信息检索系统的搜索查询中检测到停止词（也称为停止词）。最初通过将搜索查询中的术语与已知无效词列表进行比较来识别潜在的禁忌词。然后基于搜索查询和所识别的无效词来检索上下文数据。在一个实现中，上下文数据包括从文档索引检索的文档。在另一实现中，上下文数据包括与搜索查询相关的类别。将检索到的上下文数据的集合彼此进行比较以确定它们是否基本相似。如果上下文数据集合基本相似，则可以使用该事实来推断潜在的停止词的移除对搜索不重要。如果上下文数据集基本上不相似，潜在的停用词可以被认为是搜索的重要内容，不应该从查询中移除。

54.

发明申请
Client-server word-breaking framework 有权
Title translation: 客户端 - 服务器破解框架

公开(公告)号：US20070088677A1

公开(公告)日：2007-04-19

申请号：US11249623

申请日：2005-10-13

Applicant: Sanjeev Katariya , William Ramsey

Inventor： Sanjeev Katariya , William Ramsey

IPC: G06F17/30

CPC classification number: G06F17/30666 , G06F17/278 , G06F17/30616 , Y10S707/99933 , Y10S707/99934

Abstract: Word-breaking of a query from a client machine in a client-server environment includes determining whether to use a first word breaking module operable with a client machine in the client-server environment and/or a second word breaking module operable with a server in the client-server environment.

Abstract translation: 在客户机 - 服务器环境中从客户端机器断开查询包括确定是否使用可与客户机 - 服务器环境中的客户端机器一起操作的第一单词断开模块和/或可与服务器一起操作的第二单词断开模块客户端 - 服务器环境。

55.

发明授权
Method and system for information extraction 有权
Title translation: 信息提取方法与系统

公开(公告)号：US07194406B2

公开(公告)日：2007-03-20

申请号：US11032075

申请日：2005-01-11

Applicant: Eva Ingegord Ejerhed , Peter A. Braroe

Inventor： Eva Ingegord Ejerhed , Peter A. Braroe

IPC: G06F17/27 , G06F17/30

CPC classification number: G06F17/271 , G06F17/27 , G06F17/274 , G06F17/2755 , G06F17/2785 , G06F17/30663 , G06F17/30666 , G06F17/30684 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935

Abstract: A method and a system for extracting information from a natural language text corpus based on a natural language query are disclosed. In the method the natural language text corpus is analyzed with respect to surface structure of word tokens and surface syntactic roles of constituents, and the analyzed natural language text corpus is then indexed and stored. Furthermore a natural language query is analyzed with respect to surface structure of word tokens and surface syntactic roles of constituents. From the analyzed natural language query one or more surface variants are then created, where these surface variants are equivalent to the natural language query with respect to lexical meaning of word tokens and surface syntactic roles of constituents. The surface variants are then compared with the indexed and stored analyzed natural language text corpus, and each portion of text comprising a string of word tokens that matches the any one of the surface variants or the natural language query is extracted from the indexed and stored analyzed natural language text corpus.

Abstract translation: 公开了一种基于自然语言查询从自然语言文本语料库中提取信息的方法和系统。在方法中，针对词标记的表面结构和成分的表面句法角色分析了自然语言文本语料库，然后将分析的自然语言文本语料库进行索引和存储。此外，针对词标记的表面结构和成分的表面句法角色分析了自然语言查询。从分析的自然语言查询中，然后创建一个或多个表面变体，其中这些表面变体相当于自然语言查询关于词标记的词汇含义和组分的表面句法角色。然后将表面变体与索引和存储的分析的自然语言文本语料库进行比较，并且从索引和存储的分析中提取包括与任何一个表面变体或自然语言查询相匹配的字符串的文本的每个部分自然语言文本语料库。

56.

发明申请
Method and system for responding to requests relating to complex data maintained in a structured form 有权

公开(公告)号：US20060282431A1

公开(公告)日：2006-12-14

申请号：US11508023

申请日：2006-08-22

Applicant: Aaron McBride , Rob Rappaport , Jeremy Romero , Christopher Brennan , Robert Williams

Inventor： Aaron McBride , Rob Rappaport , Jeremy Romero , Christopher Brennan , Robert Williams

IPC: G06F17/30

CPC classification number: G06F17/30985 , G06F17/30666 , Y10S707/99933 , Y10S707/99942

Abstract: A method and apparatus for processing user entered input and providing a response in a system for autonomously processing requests includes rules. For each rule, whether the input is recognized is determined. If it is, a response is sent to the user. To determine recognized input, the method attempts to match the rule to a pattern. If a match is not found, the input is not recognized. If a match is found, the input is recognized and the response is sent. Alternatively, the input is conditionally recognized and a statement validator is executed which queries structured data to determine if a logic statement evaluates to true. Depending on how the statement evaluates: i) the input is recognized and the response is sent, ii) the structured data is queried again for the next statement validator, or iii) the input is not recognized and the method continues to the next rule.

57.

发明申请
Automatic stop word identification and compensation 有权
Title translation: 自动停止字识别和补偿

公开(公告)号：US20060224572A1

公开(公告)日：2006-10-05

申请号：US11348303

申请日：2006-02-07

Applicant: Robert Price

Inventor： Robert Price

IPC: G06F17/30

CPC classification number: G06F17/30666 , G06F17/30616

Abstract: Computer-based methods for automatically identifying and compensating for stop words contained in documents are described. The method for compensating for stop words includes: generating an abstract mathematical space based on documents included in a collection of documents, wherein each document has a representation in the abstract mathematical space; receiving a user query; generating a representation of the user query in the abstract mathematical; computing a similarity between the representation of the user query and the representation of each document, wherein computing a similarity between the representation of the user query and the representation of a first document in the collection of documents comprises applying a weighting function to a value associated with a frequently occurring word contained in the first document, thereby automatically compensating for the frequently occurring word contained in the first document; and displaying a result based on the similarity computations.

Abstract translation: 描述用于自动识别和补偿文档中包含的停止词的基于计算机的方法。用于补偿停止词的方法包括：基于文档集合中包含的文档生成抽象数学空间，其中每个文档在抽象数学空间中具有表示; 接收用户查询; 以抽象数学生成用户查询的表示; 计算用户查询的表示和每个文档的表示之间的相似度，其中计算用户查询的表示与文档集合中的第一文档的表示之间的相似度包括将加权函数应用于与包含在第一文档中的经常出现的词，从而自动补偿第一文档中包含的经常出现的词; 并且基于相似度计算显示结果。

58.

发明授权
Methods and systems for enabling speech-based internet searches 有权

公开(公告)号：US06934675B2

公开(公告)日：2005-08-23

申请号：US09879892

申请日：2001-06-14

Applicant: Stephen C. Glinski , Michael K. Brown

Inventor： Stephen C. Glinski , Michael K. Brown

IPC: G06F17/30 , G10L15/18 , G10L15/26 , G10L21/06

CPC classification number: G06F17/30666 , G10L15/197 , G10L2015/228

Abstract: Merged “grammars” derived from statistical indicators (e.g., N-grams and cohorts) are used to enable speech-based, Internet searches.

59.

发明授权
Information coding and retrieval system and method thereof 失效
Title translation: 信息编码和检索系统及其方法

公开(公告)号：US06775663B1

公开(公告)日：2004-08-10

申请号：US09890365

申请日：2001-07-30

Applicant: Si Han Kim

Inventor： Si Han Kim

IPC: G06F1730

CPC classification number: G06F17/30663 , G06F17/30666 , G06F17/30672 , G06F17/30725 , G06F17/30864 , Y10S707/99933 , Y10S707/99934

Abstract: A search engine system with coded information and a search method using the same is descovised. The system includes a key word input part, a database for storing information as word codes which are not real standard words, and a central process unit for assigning a word code assigned to a standard word to a word input through the key word input part or a client system, and searching information corresponding to the word code of the input word through the database. When key word(s) relating to information to be searched are input through the information input system, the input words are coded and the search is performed using the word codes through the database, thereby searching the information more precisely. In addition, since a plurality of different words having similar or same meanings are coded as one standard word code according to a simple coding rule and stored in the database, the process time for searching the information can be greatly reduced.

Abstract translation: 具有编码信息的搜索引擎系统和使用其的搜索方法被删除。该系统包括关键字输入部分，用于将信息存储为不是真实标准字的字代码的数据库，以及用于将分配给标准字的字代码分配给通过关键字输入部分输入的字的中央处理单元，或客户端系统，并通过数据库搜索与输入单词的单词代码对应的信息。当通过信息输入系统输入关于要搜索的信息的关键词时，输入字被编码，并且通过数据库使用字代码执行搜索，从而更精确地搜索信息。此外，由于具有相同或相同含义的多个不同的词根据简单的编码规则被编码为一个标准字代码并存储在数据库中，因此可以大大减少用于搜索信息的处理时间。

60.

发明授权
Multi-language document search and retrieval system 有权

公开(公告)号：US06654717B2

公开(公告)日：2003-11-25

申请号：US10080513

申请日：2002-02-25

Applicant: Wayne Loofbourrow , David Cásseres

Inventor： Wayne Loofbourrow , David Cásseres

IPC: G06F1727

CPC classification number: G06F17/2755 , G06F17/277 , G06F17/2775 , G06F17/30616 , G06F17/30666

Abstract: A multi-lingual indexing and search system performs tokenization and stemming in a manner which is independent of whether index entries and search terms appear as words in a dictionary. During the tokenization phase of the process, a string of text is separated into individual word tokens, and predetermined types of tokens are eliminated from further processing. The stemming phase of the process reduces words to grammatical stems by removing known word-endings associated with the various languages to be supported. Known word endings are removed from the word tokens without any effort to guarantee that the remaining stem is contained in a dictionary. In a preferred implementation, the stemming process is only applied to nouns.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification