Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    1.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US08631010B1

    公开(公告)日:2014-01-14

    申请号:US13474957

    申请日:2012-05-18

    IPC分类号: G06F17/30

    摘要: A method may include obtaining, based on a content of a search query, one or more documents in a first language; identifying one or more documents in a second language that contain an anchor that links to the one or more documents in the first language, the second language being different than the first language; and translating one or more terms of the search query into the second language using content included in the one or more documents in the second language.

    摘要翻译: 方法可以包括基于搜索查询的内容获得第一语言中的一个或多个文档; 以第二语言识别包含链接到所述第一语言中的一个或多个文档的锚的一个或多个文档,所述第二语言不同于所述第一语言; 以及使用所述第二语言中的一个或多个文档中包含的内容将所述搜索查询的一个或多个术语翻译成所述第二语言。

    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    2.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US07146358B1

    公开(公告)日:2006-12-05

    申请号:US09939661

    申请日:2001-08-28

    IPC分类号: G06F17/30 G06F7/00

    摘要: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

    摘要翻译: 系统执行跨语言查询翻译。 系统接收包括第一语言的搜索查询,并确定搜索查询的条款可能的翻译成第二语言。 该系统还将用作并行语料库的文档定位为通过以下方式帮助翻译:(1)以包含与搜索查询的条款匹配的引用的第一语言定位文档,并识别第二语言的文档; (2)以包含与查询条款相匹配的引用的第一语言定位文件,并引用第一语言的其他文档,并且识别包含对其他文档的引用的第二语言的文档; 或者(3)以符合查询条款的第一语言定位文档,并且识别第二语言中包含对第一语言文档的引用的文档。 系统可以使用第二语言文档作为并行语料库来消除搜索查询的术语的可能的翻译之间的歧义,并将可能的翻译之一识别为搜索查询到第二语言的可能的翻译。

    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    3.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US08190608B1

    公开(公告)日:2012-05-29

    申请号:US13174209

    申请日:2011-06-30

    IPC分类号: G06F17/30

    摘要: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

    摘要翻译: 系统执行跨语言查询翻译。 该系统接收包括第一语言的搜索查询,并确定搜索查询的术语可能的翻译成第二语言。 该系统还将用作并行语料库的文档定位为通过以下方式帮助翻译:(1)以包含与搜索查询的条款匹配的引用的第一语言定位文档,并识别第二语言的文档; (2)以包含与查询条款相匹配的引用的第一语言定位文件,并引用第一语言的其他文档,并且识别包含对其他文档的引用的第二语言的文档; 或者(3)以符合查询条款的第一语言定位文档,并且识别第二语言中包含对第一语言文档的引用的文档。 系统可以使用第二语言文档作为并行语料库来消除搜索查询的术语的可能的翻译之间的歧义,并将可能的翻译之一识别为搜索查询到第二语言的可能的翻译。

    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    4.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US07996402B1

    公开(公告)日:2011-08-09

    申请号:US12872755

    申请日:2010-08-31

    IPC分类号: G06F17/30

    摘要: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

    摘要翻译: 系统执行跨语言查询翻译。 系统接收包括第一语言的搜索查询,并确定搜索查询的条款可能的翻译成第二语言。 该系统还将用作并行语料库的文档定位为通过以下方式帮助翻译:(1)以包含与搜索查询的条款匹配的引用的第一语言定位文档,并识别第二语言的文档; (2)以包含与查询条款相匹配的引用的第一语言定位文件,并引用第一语言的其他文档,并且识别包含对其他文档的引用的第二语言的文档; 或者(3)以符合查询条款的第一语言定位文档,并且识别第二语言中包含对第一语言文档的引用的文档。 系统可以使用第二语言文档作为并行语料库来消除搜索查询的术语的可能的翻译之间的歧义,并将可能的翻译之一识别为搜索查询到第二语言的可能的翻译。

    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    5.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US07814103B1

    公开(公告)日:2010-10-12

    申请号:US11468674

    申请日:2006-08-30

    IPC分类号: G06F17/30

    摘要: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

    摘要翻译: 系统执行跨语言查询翻译。 系统接收包括第一语言的搜索查询,并确定搜索查询的条款可能的翻译成第二语言。 该系统还将用作并行语料库的文档定位为通过以下方式帮助翻译:(1)以包含与搜索查询的条款匹配的引用的第一语言定位文档,并识别第二语言的文档; (2)以包含与查询条款相匹配的引用的第一语言定位文件,并引用第一语言的其他文档,并且识别包含对其他文档的引用的第二语言的文档; 或者(3)以符合查询条款的第一语言定位文档,并且识别第二语言中包含对第一语言文档的引用的文档。 系统可以使用第二语言文档作为并行语料库来消除搜索查询的术语的可能的翻译之间的歧义,并将可能的翻译之一识别为搜索查询到第二语言的可能的翻译。

    Detecting Duplicate and Near-Duplicate Files
    7.
    发明申请
    Detecting Duplicate and Near-Duplicate Files 审中-公开
    检测重复和近重复文件

    公开(公告)号:US20120290597A1

    公开(公告)日:2012-11-15

    申请号:US13225342

    申请日:2011-09-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2211 G06F16/958

    摘要: Near-duplicate documents may be identified by (a) accepting a set of documents, (b) processing the set of documents to determine a first set of near-duplicate documents using a first document similarity technique, and (c) processing the first set of near duplicate documents to determine a second set of near-duplicate documents using a second document similarity technique. The first document similarity technique might be token order dependent, and the second document similarity technique might be order independent. The first document similarity technique might be token frequency independent, and the second document similarity technique might be frequency dependent. The first document similarity technique might determine whether two documents are near-duplicates using representations based on a subset of the words or tokens of the documents, and the second document similarity technique might determine whether two documents are near-duplicates using representations based on all of the words or tokens of the documents. The first document similarity technique might use set intersection to determine whether or not documents are near-duplicates, and the second document similarity technique might use random projections to determine whether or not documents are near-duplicates.

    摘要翻译: 可以通过以下方式来识别近似重复的文档:(a)接收一组文档,(b)使用第一文档相似性技术来处理所述一组文档以确定第一组近似重复的文档,以及(c)处理所述第一组 使用第二文档相似性技术来确定第二组近似重复的文档。 第一个文档相似性技术可能是令牌顺序相关的,第二个文档相似性技术可能是独立的。 第一个文档相似性技术可能是令牌频率无关的,第二个文档相似性技术可能是频率依赖的。 第一文档相似性技术可以基于文档的单词或令牌的子集来确定两个文档是否是近似重复的,并且第二文档相似性技术可以基于所有文档的表示来确定两个文档是否是近似重复的 文件的单词或令牌。 第一种文档相似性技术可能使用集合交集来确定文档是否是近似重复的,并且第二文档相似性技术可以使用随机投影来确定文档是否是重复的。

    Search queries improved based on query semantic information
    8.
    发明授权
    Search queries improved based on query semantic information 有权
    基于查询语义信息改进搜索查询

    公开(公告)号:US08055669B1

    公开(公告)日:2011-11-08

    申请号:US10377117

    申请日:2003-03-03

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3064

    摘要: A search query for a search engine may be improved by incorporating alternate terms into the search query that are semantically similar to terms of the search query, taking into account information derived from the search query. An initial set of alternate terms that may be semantically similar to the original terms in the search query is generated. The initial set of alternate terms may be compared to information derived from the original search query. One example of such information is a set of documents retrieved in response to a search performed using the initial search query. One or more of the alternate terms may be added to the original search query based on their relationship to the information derived from the original search query.

    摘要翻译: 可以考虑到从搜索查询导出的信息,通过将语法上与搜索查询的术语相似的搜索查询中的替换项合并来来改进搜索引擎的搜索查询。 生成可能在语义上类似于搜索查询中的原始术语的初始替代项集合。 可以将初始替代项集合与从原始搜索查询导出的信息进行比较。 这种信息的一个示例是响应于使用初始搜索查询执行的搜索而检索的一组文档。 可以根据与原始搜索查询导出的信息的关系将一个或多个替代术语添加到原始搜索查询。

    High frequency sampling of processor performance counters
    10.
    发明授权
    High frequency sampling of processor performance counters 失效
    处理器性能计数器的高频采样

    公开(公告)号:US5796939A

    公开(公告)日:1998-08-18

    申请号:US812899

    申请日:1997-03-10

    摘要: In a computer system, an apparatus is configured to collect performance data of a computer system including a plurality of processors for concurrently executing instructions of a program. A plurality of performance counters are coupled to each processor. The performance counters store performance data generated by each processor while executing the instructions. An interrupt handler executes on each processors, the interrupt handler samples the performance data of the processor in response to interrupts. A first memory includes a hash table associated with each interrupt handler, the hash table stores the performance data sampled by the interrupt handler executing on the processor. A second memory includes an overflow buffer, the overflow buffer stores the performance data while portions of the hash tables are active or full. A third memory includes a user buffer, and means are provided for periodically flushing the performance data from the hash tables and the overflow to the user buffer.

    摘要翻译: 在计算机系统中,装置被配置为收集包括多个处理器的计算机系统的性能数据,用于并行执行程序的指令。 多个性能计数器耦合到每个处理器。 性能计数器存储执行指令时由每个处理器生成的性能数据。 中断处理程序在每个处理器上执行,中断处理程序响应中断来采样处理器的性能数据。 第一存储器包括与每个中断处理程序相关联的散列表,散列表存储由处理器上执行的中断处理程序采样的性能数据。 第二存储器包括溢出缓冲器,溢出缓冲器存储性能数据,而哈希表的一部分是活动的或是满的。 第三存储器包括用户缓冲器,并且提供用于周期性地从哈希表中刷新性能数据并向用户缓冲器溢出的装置。