Language input system for mobile devices
    91.
    发明授权
    Language input system for mobile devices 有权
    移动设备语言输入系统

    公开(公告)号:US07277732B2

    公开(公告)日:2007-10-02

    申请号:US09843358

    申请日:2001-04-24

    IPC分类号: A04B1/38

    摘要: A language system facilitates entry of an input string into a mobile device using discrete keys on a keypad, such as a 10-key keypad. The numeric keys have associated letters of an alphabet. The key input is representative of one or more Chinese phonetic characters. Based on this input string, the language system derives the most likely Chinese corresponding language characters intended by the user. The language system uses multiple different search engines and language models to aid in deriving the most probable Chinese language characters. When the language system recognizes possible Chinese language characters, the mobile device displays the possible Chinese language characters for user selection of the possible Chinese language characters and/or further input of one or more Chinese phonetic characters. In this manner, the language system adopts a modeless entry methodology that eliminates conventional mode switching between input and selection operations.

    摘要翻译: 语言系统有助于使用键盘上的离散键(诸如10键键盘)将输入串输入到移动设备中。 数字键具有字母的相关字母。 关键输入是一个或多个汉语拼音字符的代表。 基于该输入字符串,语言系统导出用户想要的最可能的中文对应语言字符。 语言系统使用多种不同的搜索引擎和语言模型来帮助推导出最可能的中文字符。 当语言系统识别可能的中文字符时,移动设备显示可能的汉语字符,用于选择可能的中文字符和/或进一步输入一个或多个汉语拼音字符。 以这种方式,语言系统采用无模式输入方法,消除了输入和选择操作之间的常规模式切换。

    Training a ranking component
    92.
    发明申请
    Training a ranking component 有权
    训练排名组成部分

    公开(公告)号:US20070136281A1

    公开(公告)日:2007-06-14

    申请号:US11326283

    申请日:2006-01-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

    摘要翻译: 从用户接收到查询和事实类型选择。 访问基于事实的索引的段落索引,并检索与查询相关的段落,并且具有所选择的实例类型的段落。 检索到的段落按照排列顺序根据计算得分排列并提供给用户。

    Adaptive semantic reasoning engine
    93.
    发明申请
    Adaptive semantic reasoning engine 有权
    自适应语义推理引擎

    公开(公告)号:US20070124263A1

    公开(公告)日:2007-05-31

    申请号:US11290076

    申请日:2005-11-30

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30663

    摘要: Provided is an adaptive semantic reasoning engine that receives a natural language query, which may contain one or more contexts. The query can be broken down into tokens or a set of tokens. A task search can be performed on the token or token set(s) to classify a particular query and/or context and retrieve one or more tasks. The token or token set(s) can be mapped into slots to retrieve one or more task result. A slot filling goodness may be determined that can include scoring each task search result and/or ranking the results in a different order than the order in which the tasks were retrieved. The token or token set(s), retrieved tasks, slot filling goodness, natural language query, context, search result score and/or result ranking can be feedback to the reasoning engine for further processing and/or machine learning.

    摘要翻译: 提供了一种自适应语义推理引擎,其接收可以包含一个或多个上下文的自然语言查询。 该查询可以分为令牌或一组令牌。 可以对令牌或令牌集执行任务搜索以对特定查询和/或上下文进行分类并检索一个或多个任务。 令牌或令牌集可被映射到插槽中以检索一个或多个任务结果。 可以确定插槽填充质量,其可以包括对每个任务搜索结果进行评分和/或以与检索任务的顺序不同的顺序对结果进行排序。 令牌或令牌集,检索任务,插槽填充良品,自然语言查询,上下文,搜索结果分数和/或结果排名可以反馈到推理引擎用于进一步处理和/或机器学习。

    Metric for evaluating systems that produce text
    94.
    发明申请
    Metric for evaluating systems that produce text 审中-公开
    用于评估产生文本的系统的度量标准

    公开(公告)号:US20060247912A1

    公开(公告)日:2006-11-02

    申请号:US11115498

    申请日:2005-04-27

    IPC分类号: G06F17/20

    CPC分类号: G06F17/2211 G06F11/3616

    摘要: A method and apparatus for generating a score for a system that generates text is provided. The method and apparatus identify errors in the text generated by the system and identify errors in a second text generated by a second system. The number of errors that are generated by the system but not generated by the second system is divided by the number of errors that are generated by the second system but not by the system to generate the score.

    摘要翻译: 提供了一种用于生成用于生成文本的系统的得分的方法和装置。 所述方法和装置识别由系统生成的文本中的错误并识别由第二系统生成的第二文本中的错误。 由系统生成但不由第二系统生成的错误的数量除以第二系统生成的错误的数量,而不是系统生成的分数的错误数。

    Method and apparatus for distribution-based language model adaptation

    公开(公告)号:US07043422B2

    公开(公告)日:2006-05-09

    申请号:US09945930

    申请日:2001-09-04

    IPC分类号: G06F17/27

    摘要: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

    Unsupervised training for overlapping ambiguity resolution in word segmentation
    96.
    发明申请
    Unsupervised training for overlapping ambiguity resolution in word segmentation 审中-公开
    用于重叠模糊度分辨率的无监督训练

    公开(公告)号:US20050060150A1

    公开(公告)日:2005-03-17

    申请号:US10662502

    申请日:2003-09-15

    申请人: Mu Li Jianfeng Gao

    发明人: Mu Li Jianfeng Gao

    IPC分类号: G06F17/27 G06F17/28 G10L15/00

    CPC分类号: G06F17/2863 G06F17/2775

    摘要: A method for resolving overlapping ambiguity strings in unsegmented languages such as Chinese. The methodology includes segmenting sentences into two possible segmentations and recognizing overlapping ambiguity strings in the sentences. One of the two possible segmentations is selected as a function of probability information. The probability information is derived from unsupervised training data. A method of constructing a knowledge base containing probability information needed to select one of the segmentation is also provided.

    摘要翻译: 用于解析诸如中文的未分段语言中的重叠歧义字符串的方法。 该方法包括将句子分割成两个可能的分段,并识别句子中的重叠歧义字符串。 作为概率信息的函数选择两个可能的分段中的一个。 概率信息是从无监督的训练数据导出的。 还提供了构建包含选择分割之一所需的概率信息的知识库的方法。

    Cluster and pruning-based language model compression
    97.
    发明授权
    Cluster and pruning-based language model compression 有权
    基于群集和修剪的语言模型压缩

    公开(公告)号:US06782357B1

    公开(公告)日:2004-08-24

    申请号:US09565608

    申请日:2000-05-04

    IPC分类号: G06F1727

    CPC分类号: G06F17/2715

    摘要: Cluster- and pruning-based language model compression is disclosed. In one embodiment, a language model is first clustered, such as by using predictive clustering. The language model after clustering has a larger size than it did before clustering. The language model is then pruned, such as by using entropy-based techniques, such as Rosenfeld pruning, or by using Stolcke pruning or count-cutoff techniques. In one particular embodiment, a word language model is first predictively clustered by a technique described as P(Z|xy)×P(z|xyZ), where a lower-case letter refers to a word, and an upper-cluster letter refers to a cluster in which the word resides.

    摘要翻译: 公开了基于群集和修剪的语言模型压缩。 在一个实施例中,首先将语言模型聚类,例如通过使用预测聚类。 聚类后​​的语言模型比群集之前的语言模型大。 然后修剪语言模型,例如通过使用基于熵的技术,例如Rosenfeld修剪,或通过使用Stolcke修剪或计数截断技术。 在一个特定实施例中,首先通过描述为P(Z | xy)xP(z | xyZ)的技术预测性地聚集单词语言模型,其中小写字母指的是一个单词,而上面的簇字母是指 该单词所在的集群。