Computing numeric representations of words in a high-dimensional space

    公开(公告)号:US09740680B1

    公开(公告)日:2017-08-22

    申请号:US14715421

    申请日:2015-05-18

    Applicant: Google Inc.

    CPC classification number: G06F17/2765 G06F17/2785 G06N99/005 G10L15/06

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.

    Classifying Data Objects
    3.
    发明申请
    Classifying Data Objects 审中-公开
    分类数据对象

    公开(公告)号:US20150178383A1

    公开(公告)日:2015-06-25

    申请号:US14576907

    申请日:2014-12-19

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying data objects. One of the methods includes obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term; obtaining classification data for a data object, wherein the classification data includes a respective score for each of a plurality of categories, and wherein each of the categories is associated with a respective category label; computing an aggregate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores; identifying a first term in the vocabulary of terms having a high-dimensional representation that is closest to the aggregate high-dimensional representation; and selecting the first term as a category label for the data object.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于对数据对象进行分类。 其中一种方法包括获得将术语词汇中的每个术语与该术语的相应高维表示相关联的数据; 获取数据对象的分类数据,其中分类数据包括多个类别中的每一个的相应分数,并且其中每个类别与相应的分类标签相关联; 从与类别和相应分数相关联的类别标签的高维表示中计算数据对象的聚合高维表示; 识别具有最接近聚合高维表示的高维表示的术语词汇表中的第一项; 并选择第一项作为数据对象的类别标签。

    Computing numeric representations of words in a high-dimensional space
    4.
    发明授权
    Computing numeric representations of words in a high-dimensional space 有权
    在高维空间中计算单词的数值表示

    公开(公告)号:US09037464B1

    公开(公告)日:2015-05-19

    申请号:US13841640

    申请日:2013-03-15

    Applicant: Google Inc.

    CPC classification number: G06F17/2765 G06F17/2785 G06N99/005 G10L15/06

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于计算单词的数字表示。 一种方法包括获得一组训练数据,其中训练数据集合包括单词序列; 在训练数据集上训练分类器和嵌入函数,其中训练嵌入函数包括获得的嵌入函数参数的训练值; 使用嵌入函数根据嵌入函数参数的训练值来处理词汇表中的每个单词以产生高维空间中词汇表中每个单词的相应数值表示; 并将词汇表中的每个单词与高维空间中单词的相应数字表示相关联。

Patent Agency Ranking