Data clustering
    1.
    发明授权

    公开(公告)号:US11544491B2

    公开(公告)日:2023-01-03

    申请号:US16743306

    申请日:2020-01-15

    IPC分类号: G06K9/62 G06N20/00

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for clustering data are disclosed. In one aspect, a method includes the actions of receiving feature vectors. The actions further include, for a subset of the feature vectors, accessing a first label. The actions further include generating a classifier that is configured to associate a given feature vector with a feature vector of the subset of the feature vectors. The actions further include applying the feature vectors that are not included in the subset of the feature vectors to the classifier. The actions further include generating a dissimilarity matrix. The actions further include, based on the dissimilarity matrix, generating a graph. The actions further include, for each node of the graph, determining a second label. The actions further include, based on the second labels and the first labels, determining a training label for each feature vector.

    DATA CLUSTERING
    2.
    发明申请

    公开(公告)号:US20210216813A1

    公开(公告)日:2021-07-15

    申请号:US16743306

    申请日:2020-01-15

    IPC分类号: G06K9/62 G06N5/00 G06N20/10

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for clustering data are disclosed. In one aspect, a method includes the actions of receiving feature vectors. The actions further include, for a subset of the feature vectors, accessing a first label. The actions further include generating a classifier that is configured to associate a given feature vector with a feature vector of the subset of the feature vectors. The actions further include applying the feature vectors that are not included in the subset of the feature vectors to the classifier. The actions further include generating a dissimilarity matrix. The actions further include, based on the dissimilarity matrix, generating a graph. The actions further include, for each node of the graph, determining a second label. The actions further include, based on the second labels and the first labels, determining a training label for each feature vector.