-
公开(公告)号:US20130151501A1
公开(公告)日:2013-06-13
申请号:US13761920
申请日:2013-02-07
Applicant: Tracy Wang , Dimitra Papachristou , Moustafa A. Hammad , Jose Antonio Ramirez-Robredo
Inventor: Tracy Wang , Dimitra Papachristou , Moustafa A. Hammad , Jose Antonio Ramirez-Robredo
IPC: G06F17/30
CPC classification number: G06F17/3087 , G06F17/30631
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for index-side synonym expansion are disclosed. Some implementations include actions of obtaining a token sequence for a resource, wherein each token in the token sequence comprises one or more characters. The actions also include selecting a token from the token sequence, wherein the selected token comprises at least one numeric portion having one or more contiguous numeric characters, and at least one non-numeric portion having one or more non-numeric characters. Further actions include generating a new token corresponding to each of the at least one numeric portions of the selected token and storing data associating the selected token and each of the new tokens corresponding to the at least one numeric portion of the selected token as index terms for the resource, wherein the search engine index is accessed to augment search queries.
Abstract translation: 公开了用于索引侧同义词扩展的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。 一些实施方式包括获得资源的令牌序列的动作,其中令牌序列中的每个令牌包括一个或多个字符。 动作还包括从令牌序列中选择令牌,其中所选择的令牌包括具有一个或多个相邻数字字符的至少一个数字部分和至少一个具有一个或多个非数字字符的非数字部分。 进一步的动作包括生成对应于所选择的令牌的至少一个数字部分中的每一个的新令牌,并存储将所选择的令牌与对应于所选令牌的至少一个数字部分的每个新的令牌相关联的数据作为索引项, 资源,其中访问搜索引擎索引以增加搜索查询。
-
公开(公告)号:US08515731B1
公开(公告)日:2013-08-20
申请号:US12568435
申请日:2009-09-28
Applicant: Jose Antonio Ramirez Robredo , Dimitra Papachristou
Inventor: Jose Antonio Ramirez Robredo , Dimitra Papachristou
IPC: G06F17/28
CPC classification number: G06F17/2795 , G06F17/2854
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for synonym verification. In one aspect, a method includes receiving a term and a candidate synonym for the term. The method further includes generating a term group of one or more text strings and a synonym group of one or more text strings. Each text string in the term group corresponding to a translation of the term into a language, and each text string in the synonym group corresponding to a translation of the synonym into the language. The method further includes determining whether the candidate synonym is a valid synonym for the term from an amount of overlap between the term group of text strings and the synonym group of text strings.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于同义词验证。 一方面,一种方法包括接收该术语的术语和候选同义词。 该方法还包括生成一个或多个文本串的术语组和一个或多个文本串的同义词组。 术语组中的每个文本字符串对应于术语到语言的翻译,以及同义词组中的每个文本字符串对应于同义词到该语言的翻译。 该方法还包括根据文本串的术语组与文本串的同义词组之间的重叠量确定候选同义词是否是该术语的有效同义词。
-
公开(公告)号:US08375042B1
公开(公告)日:2013-02-12
申请号:US12942965
申请日:2010-11-09
Applicant: Tracy Wang , Dimitra Papachristou , Moustafa A. Hammad , Jose Antonio Ramirez-Robredo
Inventor: Tracy Wang , Dimitra Papachristou , Moustafa A. Hammad , Jose Antonio Ramirez-Robredo
IPC: G06F17/30
CPC classification number: G06F17/3087 , G06F17/30631
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for index-side synonym expansion. One method includes indexing a token from a resource, including determining that the token comprises a numeric portion and storing data associating the resource with both the particular token and the numeric portion in a search engine index. Another method includes indexing a token from a resource, including normalizing the token by removing a prefix matching a stopword prefix and storing data associating the resource with both the token and the normalized form of the token in a search engine index. Another method includes creating a token blacklist.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于索引侧同义词扩展。 一种方法包括从资源索引令牌,包括确定令牌包括数字部分并将资源与特定令牌和数字部分相关联的数据存储在搜索引擎索引中。 另一种方法包括从资源索引令牌,包括通过移除与停止词前缀相匹配的前缀来标准化标记,并且将资源与令牌和归一化形式的令牌相关联的数据存储在搜索引擎索引中。 另一种方法包括创建令牌黑名单。
-
-