Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data
Peng, P. ; Addam, O. ; Elzohbi, M. ; Ozyer, S. ; Elhajj, Ahmad ; Gao, S. ; Liu, Y. ; Ozyer, T. ; Kaya, M. ; Ridley, Mick J. ... show 2 more
Peng, P.
Addam, O.
Elzohbi, M.
Ozyer, S.
Elhajj, Ahmad
Gao, S.
Liu, Y.
Ozyer, T.
Kaya, M.
Ridley, Mick J.
Publication Date
2014-01
End of Embargo
Supervisor
Rights
Peer-Reviewed
Yes
Open Access status
closedAccess
Accepted for publication
2013-11-01
Institution
Department
Awarded
Embargo end date
Additional title
Abstract
Clustering is an essential research problem which has received considerable attention in the research
community for decades. It is a challenge because there is no unique solution that fits all problems and
satisfies all applications. We target to get the most appropriate clustering solution for a given application
domain. In other words, clustering algorithms in general need prior specification of the number of clus-
ters, and this is hard even for domain experts to estimate especially in a dynamic environment where the
data changes and/or become available incrementally. In this paper, we described and analyze the effec-
tiveness of a robust clustering algorithm which integrates multi-objective genetic algorithm into a frame-
work capable of producing alternative clustering solutions; it is called Multi-objective K-Means Genetic
Algorithm (MOKGA). We investigate its application for clustering a variety of datasets, including micro-
array gene expression data. The reported results are promising. Though we concentrate on gene expres-
sion and mostly cancer data, the proposed approach is general enough and works equally to cluster other
datasets as demonstrated by the two datasets Iris and Ruspini. After running MOKGA, a pareto-optimal
front is obtained, and gives the optimal number of clusters as a solution set. The achieved clustering
results are then analyzed and validated under several cluster validity techniques proposed in the litera-
ture. As a result, the optimal clusters are ranked for each validity index. We apply majority voting to
decide on the most appropriate set of validity indexes applicable to every tested dataset. The proposed
clustering approach is tested by conducting experiments using seven well cited benchmark data sets.
The obtained results are compared with those reported in the literature to demonstrate the applicability
and effectiveness of the proposed approach.
Version
No full-text in the repository
Citation
Peng P, Addam O, Elzhohbi M et al (2014) Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data. Knowledge-Based Systems. 56: 108-122.
Link to publisher’s version
Link to published version
Link to Version of Record
Type
Article