-
公开(公告)号:US20160378808A1
公开(公告)日:2016-12-29
申请号:US15186219
申请日:2016-06-17
发明人: MICHAEL JOSEPH HOPCROFT , ROBERT LOVEJOY GOODWIN , FAN WANG , ANDRIJA ANTONIJEVIC , DENIS V. DEYNEKO , UTKARSH JAIN
IPC分类号: G06F17/30
CPC分类号: G06F16/2237 , G06F16/2272 , G06F16/2282 , G06F16/24542 , G06F16/328 , G06F16/334 , G06F16/93
摘要: The technology described herein provides for indexing information in a bit vector search index. The bit vector search index comprises a data structure for indexing data about terms from a corpus of documents. The data structure includes a number of bit vectors. Each bit vector comprises an array of bits and corresponds to a different set of terms. Bits in the bit vector are used to represent whether at least one document corresponding to the bit includes at least one term from the set of terms corresponding to the bit vector. The bit vector search index is stored by first indexing information about documents using bit vectors on a first accumulation buffer storage device. When a threshold is satisfied, the information is transferred to bit vectors on a subsequent storage device.
摘要翻译: 本文描述的技术提供了位向量搜索索引中的索引信息。 位向量搜索索引包括用于索引关于来自文档语料库的术语的数据的数据结构。 数据结构包括多个位向量。 每个比特向量包括比特阵列并且对应于不同的术语集合。 位向量中的比特用于表示对应于该比特的至少一个文档是否包含与该比特向量对应的一组术语中的至少一个项。 位向量搜索索引通过在第一累加缓冲存储设备上通过使用比特向量首先索引关于文档的信息进行存储。 当满足阈值时,信息被传送到后续存储设备上的位向量。
-
公开(公告)号:US20160378769A1
公开(公告)日:2016-12-29
申请号:US15186226
申请日:2016-06-17
IPC分类号: G06F17/30
CPC分类号: G06F17/3053 , G06F17/30241 , G06F17/30324 , G06F17/30619 , G06F17/30628 , G06F17/30699
摘要: The technology described herein provides for preliminary ranking of matching documents for a search query. A preliminary ranker uses score tables for scoring each matching document based on its relevant to a search query. The score table for a document stores pre-computed data used to derive a frequency of terms and other information in the document. The preliminary ranker uses the score table for each matching document and the terms form the search query to determine a score for each matching document. The lowest scoring documents are removed from further consideration by a final ranker.
摘要翻译: 本文描述的技术提供了用于搜索查询的匹配文档的初步排名。 初步筛选者使用得分表来根据与搜索查询相关的每个匹配文档进行评分。 文档的分数表存储用于导出文档中的术语和其他信息的频率的预先计算的数据。 初步职业选手使用每个匹配文件的分数表,并从搜索查询中选择术语,以确定每个匹配文档的得分。 得分最低的文件被最后的防守者进一步考虑。
-