Multi-modal entry of ideogrammatic languages
    1.
    发明授权
    Multi-modal entry of ideogrammatic languages 失效
    表意文字语言的多式录入

    公开(公告)号:US07174288B2

    公开(公告)日:2007-02-06

    申请号:US10142572

    申请日:2002-05-08

    IPC分类号: G06F17/28

    摘要: A method for inputting ideograms into a computer system includes receiving phonetic information related to a desired ideogram to be entered and forming a candidate list of possible ideograms as a function of the phonetic information received. Stroke information, comprising one or more strokes in the desired ideogram, is received in order to obtain the desired ideogram from the candidate list.

    摘要翻译: 将表意文字输入到计算机系统中的方法包括接收与要输入的期望表意文字相关的语音信息,并形成作为所接收的语音信息的函数的可能的表意文字的候选列表。 在所希望的表意文字中包括一个或多个笔画的行程信息被接收以从候选列表中获得所需的表意文字。

    Method and system for dynamically adjusted training for speech
recognition
    2.
    发明授权
    Method and system for dynamically adjusted training for speech recognition 失效
    用于语音识别的动态调整训练的方法和系统

    公开(公告)号:US5963903A

    公开(公告)日:1999-10-05

    申请号:US673435

    申请日:1996-06-28

    CPC分类号: G10L15/063 G10L2015/0635

    摘要: A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.

    摘要翻译: 一种用于动态选择用于训练语音识别系统的单词的方法和系统。 语音识别系统使用隐马尔科夫模型对每个音素进行建模,并将每个单词表示为音素序列。 训练系统根据将相应的码字作为音素的一部分被说出的概率,对每个帧的每个音素进行排序。 训练系统收集对应词语已知的口语说话。 然后,训练系统将每个话语的码字与被认为是其一部分的音素对齐。 训练系统然后使用对齐的帧的对齐码字来计算每个音素的平均等级。 最后,训练系统选择包含低等级音素的训练词。

    Joint ranking model for multilingual web search
    3.
    发明授权
    Joint ranking model for multilingual web search 有权
    多语言网络搜索的联合排名模型

    公开(公告)号:US08326785B2

    公开(公告)日:2012-12-04

    申请号:US12241078

    申请日:2008-09-30

    CPC分类号: G06F17/30675

    摘要: A classifier is built to rank documents of different languages found in a query based at least in part on similarity to other documents and the relevance of those other documents to the query. A joint ranking model, e.g., based upon a Boltzmann machine, is used to represent the content similarity among documents, and to help determine joint relevance probability for a set of documents. The relevant documents of one language are thus leveraged to improve the relevance estimation for documents of different languages. In one aspect, a hidden layer of units (neurons) represents clusters (corresponding to relevant topics) among the retrieved documents, with an output layer representing the relevant documents and their features, and edges representing a relationship between clusters and documents.

    摘要翻译: 构建分类器至少部分地基于与其他文档的相似性以及这些其他文档与查询的相关性来对查询中发现的不同语言的文档进行排序。 联合排名模型,例如基于玻尔兹曼(Boltzmann)机器,用于表示文档之间的内容相似性,并且帮助确定一组文档的联合相关概率。 因此,利用一种语言的相关文件来改进不同语言文件的相关性估计。 在一个方面,隐藏的单位(神经元)表示检索的文档中的集群(对应于相关主题),输出层表示相关文档及其特征,边缘表示集群和文档之间的关系。

    Processing collocation mistakes in documents
    4.
    发明授权
    Processing collocation mistakes in documents 有权
    处理文件中的并置错误

    公开(公告)号:US07574348B2

    公开(公告)日:2009-08-11

    申请号:US11177136

    申请日:2005-07-08

    IPC分类号: G06F17/27

    摘要: A sentence is accessed and at least one query is generated based on the sentence. At least one query can be compared to text within a collection of documents, for example using a web search engine. Collocation errors in the sentence can be detected and/or corrected based on the comparison of the at least one query and the text within the collection of documents.

    摘要翻译: 访问一个句子,并且基于该句子生成至少一个查询。 至少可以将一个查询与文档集合中的文本进行比较,例如使用Web搜索引擎。 可以基于至少一个查询与文档集合内的文本的比较来检测和/或修正该句子中的配置错误。

    Web-based collocation error proofing
    5.
    发明申请
    Web-based collocation error proofing 有权
    基于Web的搭配错误打样

    公开(公告)号:US20080133444A1

    公开(公告)日:2008-06-05

    申请号:US11633788

    申请日:2006-12-05

    IPC分类号: G06N7/02 G06F17/30 G06F3/048

    摘要: Collocation errors can be automatically proofed using local and network-based corpora, including the Web. For example, according to one illustrative method, one or more collocations from a text sample are compared with a corpus such as the content of the Web. The collocations are identified for whether they are disfavored in the corpus. Indications are provided via an output device of whether the collocations are disfavored in the corpus. Additional steps may then be taken such as searching for and providing potentially proper word collocations via a user output.

    摘要翻译: 可以使用本地和基于网络的语料库(包括Web)自动验证并置错误。 例如,根据一个说明性方法,将来自文本样本的一个或多个并置与诸如Web的内容的语料库进行比较。 识别他们是否在语料库中不利的搭配。 通过输出设备提供指示是否在语料库中不匹配。 然后可以采取额外的步骤,例如通过用户输出搜索并提供潜在的适当的单词搭配。

    Compression of logs of language data
    6.
    发明申请
    Compression of logs of language data 审中-公开
    压缩日志的语言数据

    公开(公告)号:US20050203934A1

    公开(公告)日:2005-09-15

    申请号:US10796644

    申请日:2004-03-09

    CPC分类号: H03M7/30

    摘要: A method and apparatus for compressing query logs is provided. Multiple levels of user-specifiable compression include character-based compression, token-based compression, and subsumption. An efficient method for performing subsumption is also provided. The compressed query logs are then used to train a statistical process such as a help function for a computer operating system.

    摘要翻译: 提供了一种用于压缩查询日志的方法和装置。 用户可指定压缩的多个级别包括基于字符的压缩,基于令牌的压缩和包含。 还提供了一种执行包含的有效方法。 然后,压缩的查询日志用于训练诸如用于计算机操作系统的帮助功能的统计过程。

    Automatic text generation
    7.
    发明申请
    Automatic text generation 审中-公开
    自动文本生成

    公开(公告)号:US20050033713A1

    公开(公告)日:2005-02-10

    申请号:US10887058

    申请日:2004-07-08

    CPC分类号: G06F17/2881 G06F9/453

    摘要: A text generator automatically generating a text document based on the actions of an author on a user interface. To generate the text document the author activates a recording component. The recording component records the author's actions on the user interface. Based on the recorded actions, a text generation component searches a text database and identifies an entry that matches the author's recorded actions. This text is then combined to form a text document, which provides instruction or other information to a user. During the process of generating the text document, the text can be edited using an editor as desired, such as to enhance the comprehensibility of the document.

    摘要翻译: 文本生成器根据作者在用户界面上的动作自动生成文本文档。 要生成文本文档,作者激活录制组件。 录音组件将作者的动作记录在用户界面上。 基于记录的动作,文本生成组件搜索文本数据库并识别与作者记录的动作相匹配的条目。 然后将该文本组合以形成文本文档,其向用户提供指令或其他信息。 在生成文本文档的过程中,可以使用编辑器根据需要编辑文本,以增强文档的可理解性。

    Method and apparatus for tone-sensitive acoustic modeling
    8.
    发明授权
    Method and apparatus for tone-sensitive acoustic modeling 失效
    用于音调声学建模的方法和装置

    公开(公告)号:US5884261A

    公开(公告)日:1999-03-16

    申请号:US271639

    申请日:1994-07-07

    摘要: Tone-sensitive acoustic models are generated by first generating acoustic vectors which represent the input data. The input data is separated into multiple frames and an acoustic vector is generated for each frame which represents the input data over its corresponding frame. A tone-sensitive parameter is then generated for each of the frames which indicates the tone of the input data at its corresponding frame. Tone-sensitive parameters are generated in accordance with two embodiments. First, a pitch detector may be used to calculate a pitch for each of the frames. If a pitch cannot be detected for a particular frame, then a pitch is created for that frame based on the pitch values of surrounding frames. Second, the cross covariance between the autocorrelation coefficients for each frame and its successive frame may be generated and used as the tone-sensitive parameter. Feature vectors are then created for each frame by appending the tone-sensitive parameter for a frame to the acoustic vector for the same frame. Then, using these feature vectors, acoustic models are created which represent the input data.

    摘要翻译: 通过首先产生表示输入数据的声矢量来产生音调敏感的声学模型。 输入数据被分成多个帧,并且为代表其对应帧上的输入数据的每个帧生成声向量。 然后,对于指示在其对应帧处的输入数据的音调的每个帧,生成对音调敏感的参数。 根据两个实施例产生音敏参数。 首先,可以使用音调检测器来计算每个帧的音调。 如果对于特定帧不能检测到音调,则基于周围帧的音调值创建针对该帧的音高。 其次,可以生成每个帧及其连续帧的自相关系数之间的交叉协方差,并将其用作音调敏感参数。 然后通过将帧的音调敏感参数附加到相同帧的声矢量来为每个帧创建特征向量。 然后,使用这些特征向量,创建表示输入数据的声学模型。

    Character-based correction arrangement with correction propagation
    9.
    发明授权
    Character-based correction arrangement with correction propagation 失效
    基于字符的校正布置与校正传播

    公开(公告)号:US5761687A

    公开(公告)日:1998-06-02

    申请号:US539342

    申请日:1995-10-05

    IPC分类号: G06F17/27 G06F17/21

    CPC分类号: G06F17/273

    摘要: A method of correcting a text in a data processing system is described. The method includes the step of locating a first incorrect character in the text. A character list of alternative characters for the first incorrect character is then shown to the user who replaces the first incorrect character with a correct character from the character list. The change of the first incorrect character is then propagated through a remainder of the text in accordance with a matching score and a language probability score of the remainder of the text with respect to the correct character to correct any subsequent incorrect character in the text.

    摘要翻译: 描述了一种在数据处理系统中校正文本的方法。 该方法包括在文本中定位第一个不正确的字符的步骤。 然后,向从字符列表中使用正确字符替换第一个不正确字符的用户显示第一个不正确字符的替代字符的字符列表。 然后根据文本的剩余部分的匹配分数和文本的语言概率得分相对于正确的字符来传播第一个不正确字符的改变,通过文本的其余部分来修正文本中的后续不正确的字符。

    Continuous mandarin chinese speech recognition system having an
integrated tone classifier
    10.
    发明授权
    Continuous mandarin chinese speech recognition system having an integrated tone classifier 失效
    连续汉语中文语音识别系统具有综合音分类器

    公开(公告)号:US5602960A

    公开(公告)日:1997-02-11

    申请号:US316257

    申请日:1994-09-30

    CPC分类号: G10L15/04 G10L25/15

    摘要: A speech recognition system for continuous Mandarin Chinese speech comprises a microphone, an A/D converter, a syllable recognition system, an integrated tone classifier, and a confidence score augmentor. The syllable recognition system generates N-best theories with initial confidence scores. The integrated tone classifier has a pitch estimator to estimate the pitch of the input once and a long-term tone analyzer to segment the estimated pitch according to the syllables of each of the N-best theories. The long-term tone analyzer performs long-term tonal analysis on the segmented, estimated pitch and generates a long-term tonal confidence signal. The confidence score augmentor receives the initial confidence scores and the long-term tonal confidence signals, modifies each initial confidence score according to the corresponding long-term tonal confidence signal, re-ranks the N-best theories according to the augmented confidence scores, and outputs the N-best theories.

    摘要翻译: 用于连续汉语普通话的语音识别系统包括麦克风,A / D转换器,音节识别系统,集成音分类器和置信分数增强器。 音节识别系统产生具有初始置信分数的N最佳理论。 综合音分类器具有估计输入音高的音调估计器和一个长期音调分析器,以根据每个N最佳理论的音节来分段估计音高。 长期音调分析仪对分段估计音高进行长期色调分析,并产生长期色调置信度信号。 信心分数增强器接收初始置信度分数和长期音调信号,根据相应的长期音调信号信号修改每个初始置信度分数,根据增强的置信度得分重新排列N最佳理论; 输出N最好的理论。