-
公开(公告)号:US20160307000A1
公开(公告)日:2016-10-20
申请号:US12942967
申请日:2010-11-09
Applicant: Phuong B. Nguyen , Dimitra Papachristou
Inventor: Phuong B. Nguyen , Dimitra Papachristou
IPC: G06F17/30
CPC classification number: G06F21/64 , G06F16/3337 , G06F17/273 , G06F17/2795
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for index-side synonym expansion. One method includes obtaining a token sequence for a resource and indexing a particular token in the token sequence. The indexing includes obtaining a diacritically canonicalized form of the particular token; determining that the diacritically canonicalized form of the particular token is different from the particular token; and storing data associating the resource with both the particular token and the different diacritically canonicalized form of the particular token as index terms for the resource in a search engine.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于索引侧同义词扩展。 一种方法包括获得资源的令牌序列并索引令牌序列中的特定令牌。 索引包括获得特定令牌的二义性规范化形式; 确定特定令牌的二义性规范化形式与特定令牌不同; 以及存储将所述资源与所述特定令牌和所述特定令牌的不同二进制规范化形式相关联的数据,作为所述资源在搜索引擎中的索引项。
-
公开(公告)号:US09037591B1
公开(公告)日:2015-05-19
申请号:US13460582
申请日:2012-04-30
Applicant: Dimitra Papachristou , Phuong B. Nguyen
Inventor: Dimitra Papachristou , Phuong B. Nguyen
IPC: G06F17/30
CPC classification number: G06F17/30312 , G06F17/30321 , G06F17/30616 , G06F17/30864
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing, in an index associated with a document, a particular term that occurs in the document, wherein the particular term comprises n words, and wherein n is greater than 1; identifying a substitute term of the particular term; and in response to identifying the substitute term of the particular term, storing, in the index associated with the document, (i) the substitute term of the particular term, and (ii) data indicating that the substitute term spans the n words of the particular term.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于在与文档相关联的索引中存储在文档中出现的特定术语,其中特定术语包括n个词,并且其中n更大 比1; 确定特定术语的替代术语; 并且响应于识别特定术语的替代项,在与该文档相关联的索引中存储(i)特定术语的替代术语,以及(ii)表示替代术语跨越 特别术语。
-