Finding Related Articles for a Content Stream Using Iterative Merge-Split Clusters

    公开(公告)号:US20170193074A1

    公开(公告)日:2017-07-06

    申请号:US14985302

    申请日:2015-12-30

    Applicant: Yahoo! Inc.

    Abstract: Software generates an article signature for each article in a plurality of articles. The software initializes a clustering algorithm with a plurality of initial clusters that are non-overlapping. A centroid signature is generated for each initial cluster from the article signatures of the articles in the initial cluster. The software performs a succession of alternating merges and splits using the centroid signatures to create a plurality of non-overlapping coherent clusters from the plurality of initial clusters. The software identifies an article that is related to a specific article by mapping the article signature for the specific article to the centroid signature for at least one coherent cluster and comparing that article signature to the article signatures of the articles in the coherent cluster, using at least one similarity measure. The software displays the specific article and the related article in proximity to each other in a content stream.

Patent Agency Ranking