Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    1.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US08631010B1

    公开(公告)日:2014-01-14

    申请号:US13474957

    申请日:2012-05-18

    IPC分类号: G06F17/30

    摘要: A method may include obtaining, based on a content of a search query, one or more documents in a first language; identifying one or more documents in a second language that contain an anchor that links to the one or more documents in the first language, the second language being different than the first language; and translating one or more terms of the search query into the second language using content included in the one or more documents in the second language.

    摘要翻译: 方法可以包括基于搜索查询的内容获得第一语言中的一个或多个文档; 以第二语言识别包含链接到所述第一语言中的一个或多个文档的锚的一个或多个文档,所述第二语言不同于所述第一语言; 以及使用所述第二语言中的一个或多个文档中包含的内容将所述搜索查询的一个或多个术语翻译成所述第二语言。

    Detecting Duplicate and Near-Duplicate Files
    2.
    发明申请
    Detecting Duplicate and Near-Duplicate Files 审中-公开
    检测重复和近重复文件

    公开(公告)号:US20120290597A1

    公开(公告)日:2012-11-15

    申请号:US13225342

    申请日:2011-09-02

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2211 G06F16/958

    摘要: Near-duplicate documents may be identified by (a) accepting a set of documents, (b) processing the set of documents to determine a first set of near-duplicate documents using a first document similarity technique, and (c) processing the first set of near duplicate documents to determine a second set of near-duplicate documents using a second document similarity technique. The first document similarity technique might be token order dependent, and the second document similarity technique might be order independent. The first document similarity technique might be token frequency independent, and the second document similarity technique might be frequency dependent. The first document similarity technique might determine whether two documents are near-duplicates using representations based on a subset of the words or tokens of the documents, and the second document similarity technique might determine whether two documents are near-duplicates using representations based on all of the words or tokens of the documents. The first document similarity technique might use set intersection to determine whether or not documents are near-duplicates, and the second document similarity technique might use random projections to determine whether or not documents are near-duplicates.

    摘要翻译: 可以通过以下方式来识别近似重复的文档:(a)接收一组文档,(b)使用第一文档相似性技术来处理所述一组文档以确定第一组近似重复的文档,以及(c)处理所述第一组 使用第二文档相似性技术来确定第二组近似重复的文档。 第一个文档相似性技术可能是令牌顺序相关的,第二个文档相似性技术可能是独立的。 第一个文档相似性技术可能是令牌频率无关的,第二个文档相似性技术可能是频率依赖的。 第一文档相似性技术可以基于文档的单词或令牌的子集来确定两个文档是否是近似重复的,并且第二文档相似性技术可以基于所有文档的表示来确定两个文档是否是近似重复的 文件的单词或令牌。 第一种文档相似性技术可能使用集合交集来确定文档是否是近似重复的,并且第二文档相似性技术可以使用随机投影来确定文档是否是重复的。

    Search queries improved based on query semantic information
    3.
    发明授权
    Search queries improved based on query semantic information 有权
    基于查询语义信息改进搜索查询

    公开(公告)号:US08055669B1

    公开(公告)日:2011-11-08

    申请号:US10377117

    申请日:2003-03-03

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3064

    摘要: A search query for a search engine may be improved by incorporating alternate terms into the search query that are semantically similar to terms of the search query, taking into account information derived from the search query. An initial set of alternate terms that may be semantically similar to the original terms in the search query is generated. The initial set of alternate terms may be compared to information derived from the original search query. One example of such information is a set of documents retrieved in response to a search performed using the initial search query. One or more of the alternate terms may be added to the original search query based on their relationship to the information derived from the original search query.

    摘要翻译: 可以考虑到从搜索查询导出的信息,通过将语法上与搜索查询的术语相似的搜索查询中的替换项合并来来改进搜索引擎的搜索查询。 生成可能在语义上类似于搜索查询中的原始术语的初始替代项集合。 可以将初始替代项集合与从原始搜索查询导出的信息进行比较。 这种信息的一个示例是响应于使用初始搜索查询执行的搜索而检索的一组文档。 可以根据与原始搜索查询导出的信息的关系将一个或多个替代术语添加到原始搜索查询。

    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
    5.
    发明授权
    Systems and methods for using anchor text as parallel corpora for cross-language information retrieval 有权
    使用锚文本作为跨语言信息检索的并行语料库的系统和方法

    公开(公告)号:US07146358B1

    公开(公告)日:2006-12-05

    申请号:US09939661

    申请日:2001-08-28

    IPC分类号: G06F17/30 G06F7/00

    摘要: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

    摘要翻译: 系统执行跨语言查询翻译。 系统接收包括第一语言的搜索查询,并确定搜索查询的条款可能的翻译成第二语言。 该系统还将用作并行语料库的文档定位为通过以下方式帮助翻译:(1)以包含与搜索查询的条款匹配的引用的第一语言定位文档,并识别第二语言的文档; (2)以包含与查询条款相匹配的引用的第一语言定位文件,并引用第一语言的其他文档,并且识别包含对其他文档的引用的第二语言的文档; 或者(3)以符合查询条款的第一语言定位文档,并且识别第二语言中包含对第一语言文档的引用的文档。 系统可以使用第二语言文档作为并行语料库来消除搜索查询的术语的可能的翻译之间的歧义,并将可能的翻译之一识别为搜索查询到第二语言的可能的翻译。

    High frequency sampling of processor performance counters
    6.
    发明授权
    High frequency sampling of processor performance counters 失效
    处理器性能计数器的高频采样

    公开(公告)号:US5796939A

    公开(公告)日:1998-08-18

    申请号:US812899

    申请日:1997-03-10

    摘要: In a computer system, an apparatus is configured to collect performance data of a computer system including a plurality of processors for concurrently executing instructions of a program. A plurality of performance counters are coupled to each processor. The performance counters store performance data generated by each processor while executing the instructions. An interrupt handler executes on each processors, the interrupt handler samples the performance data of the processor in response to interrupts. A first memory includes a hash table associated with each interrupt handler, the hash table stores the performance data sampled by the interrupt handler executing on the processor. A second memory includes an overflow buffer, the overflow buffer stores the performance data while portions of the hash tables are active or full. A third memory includes a user buffer, and means are provided for periodically flushing the performance data from the hash tables and the overflow to the user buffer.

    摘要翻译: 在计算机系统中,装置被配置为收集包括多个处理器的计算机系统的性能数据,用于并行执行程序的指令。 多个性能计数器耦合到每个处理器。 性能计数器存储执行指令时由每个处理器生成的性能数据。 中断处理程序在每个处理器上执行,中断处理程序响应中断来采样处理器的性能数据。 第一存储器包括与每个中断处理程序相关联的散列表,散列表存储由处理器上执行的中断处理程序采样的性能数据。 第二存储器包括溢出缓冲器,溢出缓冲器存储性能数据,而哈希表的一部分是活动的或是满的。 第三存储器包括用户缓冲器,并且提供用于周期性地从哈希表中刷新性能数据并向用户缓冲器溢出的装置。

    Scheduling in computer clusters
    7.
    发明授权
    Scheduling in computer clusters 有权
    在计算机集群中进行计划

    公开(公告)号:US08843929B1

    公开(公告)日:2014-09-23

    申请号:US12173697

    申请日:2008-07-15

    IPC分类号: G06F9/455 G06F9/46

    CPC分类号: G06F9/5044 G06F9/5033

    摘要: A computer-implemented method for assigning computing tasks to computers in a group is disclosed. The method includes determining, for each computer in a group of computers, an ability of the computer to execute tasks expected to be received by the group of computers; generating an ability indicator for each computer based on the ability determination for the computer; and assigning an incoming computing task to one of the computers using the ability indicator.

    摘要翻译: 公开了一种用于将计算任务分配给组中的计算机的计算机实现的方法。 该方法包括为计算机组中的每个计算机确定计算机执行预期由该组计算机接收的任务的能力; 基于计算机的能力确定为每台计算机生成能力指标; 以及使用所述能力指示器将传入计算任务分配给所述计算机之一。

    Hypertext browser assistant
    8.
    发明授权
    Hypertext browser assistant 有权
    超文本浏览器助手

    公开(公告)号:US08560564B1

    公开(公告)日:2013-10-15

    申请号:US12178748

    申请日:2008-07-24

    IPC分类号: G06F17/30

    摘要: A system facilitates a search by a user. The system detects selection of one or more words in a document currently accessed by the user, generates a search query using the selected word(s), and retrieves a document based on the search query. When the document includes one or more links corresponding to a linked document, the system analyzes each of the links, prefetches the linked documents corresponding to a number of the links, and presents the document to the user. The system receives selection of one of the links and retrieves the linked document corresponding to the selected link. The system identifies one or more pieces of information in the retrieved document, determines a link to a related document for each of the identified pieces of information, and provides the determined links with the related document to the user.

    摘要翻译: 系统便于用户的搜索。 系统检测对当前由用户访问的文档中的一个或多个单词的选择,使用所选择的单词生成搜索查询,并且基于搜索查询来检索文档。 当文档包括对应于链接文档的一个或多个链接时,系统分析每个链接,预取与多个链接相对应的链接文档,并将该文档呈现给用户。 系统接收对其中一个链接的选择并检索与所选链接相对应的链接文档。 该系统识别检索到的文档中的一条或多条信息,确定针对每个标识的信息段的相关文档的链接,并将确定的链接与相关文档提供给用户。

    Voice interface for a search engine
    9.
    发明授权
    Voice interface for a search engine 有权
    搜索引擎的语音界面

    公开(公告)号:US08380502B1

    公开(公告)日:2013-02-19

    申请号:US13273925

    申请日:2011-10-14

    IPC分类号: G10L15/08 G10L15/20 G06F17/30

    摘要: A system receives a voice search query from a user, derives recognition hypotheses from the voice search query, and determines scores associated with the recognition hypotheses, the scores being based on a comparison of the recognition hypotheses to previously received search queries. The system discards at least one of the recognition hypotheses that is associated with a first score that is less than a threshold value, and constructs a first query using at least one non-discarded recognition hypothesis, where the at least one first non-discarded recognition hypothesis is associated with a second score that at least meets the threshold value. The system forwards the first query to a search system, receives first results associated with the first query, and provides the first results to the user.

    摘要翻译: 系统从用户接收语音搜索查询,从语音搜索查询中导出识别假设,并且确定与识别假设相关联的分数,分数基于识别假设与先前接收的搜索查询的比较。 系统丢弃与小于阈值的第一分数相关联的识别假设中的至少一个,并且使用至少一个未丢弃的识别假设构建第一查询,其中至少一个第一未被丢弃的识别 假设与至少满足阈值的第二分数相关联。 系统将第一个查询转发到搜索系统,接收与第一个查询相关联的第一个结果,并向用户提供第一个结果。