EFFICIENTLY INITIALIZING DISTRIBUTED CLUSTERING ON LARGE DATA SETS

    公开(公告)号:US20190303387A1

    公开(公告)日:2019-10-03

    申请号:US16370952

    申请日:2019-03-30

    Abstract: Systems and methods capable of initializing centroids in large datasets before commencement of clustering operations. The systems and methods can utilize a random sampling window to increase the speed of centroid initialization. The systems and methods can be modified to leverage parallelism and be configured for execution on multi-node compute clusters. Optionally, the initialization systems and methods can include post-initialization centroid discarding and/or re-assignment operations that adaptively control cluster sizes.

Patent Agency Ranking