Open Access Open Access  Restricted Access Subscription Access

A Hybrid Partitioning Algorithm for Robust Big Data Clustering and Analysis

Y. A. Joarder, Kh. Mustafizur Rahman, Ahsan Ullah

Abstract


Clustering algorithms try to get groups or clusters of data points that belong together. The main aims of this research are: to improve the K-MEANS clustering quality by removing empty clustering and inefficient data clustering issues using the hybrid partitioning algorithm and to do comparison of advanced experimental results between K-MEANS and the proposed hybrid partitioning algorithm respectively. This research gives surety of achieving high quality clustering that is all in one solution for the foremost well-known problems in data mining. Though, K-MEANS converges fairly quickly, achieving a decent solution is not guaranteed. The clustering quality is very dependent on the choice of the initial centroid selection; once the number of clusters increases, it starts to suffer from two issues: ‘Empty Clustering’ and ‘Inefficient Data Clustering’. We have developed a hybrid partitioning algorithm named Hybrid Robust Clustering with Multiple Solutions (HRCMS) for achieving these purposes. Firstly, it clusters the whole data set. Secondly, it detects the empty cluster and validates the inefficient data from the inefficient cluster. Finally, it removes the empty clustering and inefficient data clustering issues without any data loss. The proposed algorithm combines ‘Uplifted Enhanced Fireworks Algorithm’ and ‘Anointed Cuckoo Search based Evolutionary Algorithm’ with some centroid-calculation heuristics.

Full Text:

PDF

References


M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein., Cluster analysis and display of genome-wide expression patterns, Proc Natl Acad Sci, PNAS December 8, 1998.95.(25):14863-14868p.

J. Zhang, Z. Ghahramani, and Y. Yang. A probabilistic model for online document clustering with application to novelty detection, in NIPS, 2004.

D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning, in KDD, 1999.

S. Johnson. Hierarchical clustering schemes, Psychometrika, 1967.2:241–254p.

R. O. Duda, P. E. Hart., and D. G. Strok, Pattern classification and scene analysis, Wiley, 1973.

J. Shi and J. Malik., Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000.22(8):888–905p.

M. Meila and J. Shi., Learning segmentation with random walk, Neural Information Processing Systems, 2001.

A. Y. Ng, M. I. Jordan, and Y. Weiss., On spectral clustering: Analysis and an algorithm. NIPS, 2001.

J. Hartigan and M. Wong., Algorithm AS 136: A K-MEANS clustering algorithm, Applied Statistics, 1979.28(1):100–108p.

G. McLachlan and D. A. Peel., Finite Mixture Models, Wiley, 2000.


Refbacks

  • There are currently no refbacks.