Author : Babita Kumari 1
Date of Publication :17th November 2017
Abstract: Distributed Clustering is in itself a non-trivial challenge and it has further constraints of limiting the communication overhead and the number of processors or deciding the number of parameters required for clustering. There have been several attempts to perform K-means in a distributed environment and some density-based clustering approaches in the distributed environment. Every approach has its own advantage and drawbacks. This paper proposes how density-based approach and a K- means approach can be combined, such that very less information is exchanged among the processor. A Local Clustering is performed based on density concepts. These summaries are then combined together to obtain the global clustering labels through K-means. The K-means is not performed on any portion of data set, rather the information provided by the processors through local clustering. Thus given very less information exchanged the clustering can be performed. We have compared the results against a centralized algorithm baseline, to show the effectiveness.
Reference :