Clustering Classes in Language Modeling
    1.
    发明申请
    Clustering Classes in Language Modeling 有权
    语言建模中的聚类

    公开(公告)号:US20160062985A1

    公开(公告)日:2016-03-03

    申请号:US14656027

    申请日:2015-03-12

    Applicant: Google Inc.

    CPC classification number: G06F17/30707 G06F17/2715 G06F17/2775

    Abstract: This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.

    Abstract translation: 本文档尤其描述了计算机实现的方法。 该方法可以包括获得多个文本样本,每个文本样本包括属于第一类术语的一个或多个术语。 多个文本样本可以分为多组文本样本。 每组文本样本可以对应于不同的子类的术语。 对于每组文本样本,可以基于相应文本样本组中的文本样本生成子类上下文模型。 被确定为相似的子类上下文模型中的特定的上下文模型可以被合并以生成上下文模型的分层集合。 此外,该方法可以包括选择上下文模型中的特定模型,并且基于所选择的上下文模型生成基于类的语言模型。

    Language Modeling Using Entities
    2.
    发明申请
    Language Modeling Using Entities 审中-公开
    使用实体的语言建模

    公开(公告)号:US20150340024A1

    公开(公告)日:2015-11-26

    申请号:US14708987

    申请日:2015-05-11

    Applicant: Google Inc.

    Abstract: Among other things, this document describes a computer-implemented method. The method can include obtaining a plurality of text samples. For each of one or more text samples in the plurality of text samples, the text sample can be annotated with one or more labels that indicate respective classes to which one or more terms in the text sample are assigned, wherein annotating the text sample comprises determining that at least one term in the text sample corresponds to a first entity in a data structure of interconnected entities and determining a classification of the first entity within the data structure of interconnected entities. The method can include generating a class-based training set of text samples. A class-based language model can be trained using the class-based training set of text samples. A plurality of class-specific language models can be trained.

    Abstract translation: 其中,本文档描述了计算机实现的方法。 该方法可以包括获得多个文本样本。 对于多个文本样本中的一个或多个文本样本中的每个文本样本,文本样本可以用一个或多个标签来注释,该标签指示分配文本样本中的一个或多个术语的相应类别,其中注释文本样本包括确定 文本样本中的至少一个项目对应于互连实体的数据结构中的第一实体,并确定互连实体的数据结构内的第一实体的分类。 该方法可以包括生成基于类的训练集合的文本样本。 可以使用基于类的文本样本训练集来训练基于类的语言模型。 可以训练多个类特定语言模型。

    Clustering classes in language modeling
    3.
    发明授权
    Clustering classes in language modeling 有权
    语言建模中的聚类

    公开(公告)号:US09529898B2

    公开(公告)日:2016-12-27

    申请号:US14656027

    申请日:2015-03-12

    Applicant: Google Inc.

    CPC classification number: G06F17/30707 G06F17/2715 G06F17/2775

    Abstract: This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.

    Abstract translation: 本文档尤其描述了计算机实现的方法。 该方法可以包括获得多个文本样本,每个文本样本包括属于第一类术语的一个或多个术语。 多个文本样本可以分为多组文本样本。 每组文本样本可以对应于不同的子类的术语。 对于每组文本样本,可以基于相应文本样本组中的文本样本生成子类上下文模型。 被确定为相似的子类上下文模型中的特定的上下文模型可以被合并以生成上下文模型的分层集合。 此外,该方法可以包括选择上下文模型中的特定模型,并且基于所选择的上下文模型生成基于类的语言模型。

    USING LANGUAGE MODELS TO CORRECT MORPHOLOGICAL ERRORS IN TEXT
    4.
    发明申请
    USING LANGUAGE MODELS TO CORRECT MORPHOLOGICAL ERRORS IN TEXT 审中-公开
    使用语言模型来纠正文本中的形态错误

    公开(公告)号:US20150242386A1

    公开(公告)日:2015-08-27

    申请号:US14190597

    申请日:2014-02-26

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech in an utterance. The methods, systems, and apparatus may include actions of obtaining a candidate transcription including a sequence of words and generating morphological variants of one or more of the words from the candidate transcription. Additional actions may include, for each morphological variant, generating one or more additional candidate transcriptions that each include the morphological variant. Further actions may include generating respective language model scores for the candidate transcription and the one or more additional candidate transcriptions. Additional actions may include selecting a particular transcription from among the candidate transcription and the one or more additional candidate transcriptions, based on the language model scores.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于以话语识别语音的计算机程序。 方法,系统和装置可以包括获得包括词序列的候选转录以及从候选转录中产生一个或多个词的形态变体的动作。 对于每个形态学变体,附加的动作可以包括产生一个或多个另外的候选转录物,每个候选转录物包括形态学变体。 进一步的动作可以包括为候选转录和一个或多个另外的候选转录生成相应的语言模型得分。 附加动作可以包括基于语言模型得分从候选转录和一个或多个另外的候选转录中选择特定的转录。

Patent Agency Ranking