详情页

Combining K-MEANS and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering

Islam, Md Zahidul Estivill-Castro, Vladimir Rahman, Md Anisur Show more

2018

Knowledge discovery from data can be broadly categorized into two types: supervised and unsupervised. A supervised knowledge discovery process such as classification by decision trees typically requires class labels which are sometimes unavailable in datasets. Unsupervised knowledge discovery techniques such as an unsupervised clustering technique can handle datasets without class labels. They aim to let data reveal the groups (i.e. the data elements in each group) and the number of groups. For the ubiquitous task of clustering, K-MEANS is the most used algorithm applied in a broad range of areas to identify groups where intra-group distances are much smaller than inter-group distances. As a representative-based clustering approach, K-MEANs offers an extremely efficient gradient descent approach to the total squared error of representation; however, it not only demands the parameter k, but it also makes assumptions about the similarity of density among the clusters. Therefore, it is profoundly affected by noise. Perhaps more seriously, it can often be attracted to local optima despite its immersion in a multi-start scheme. We present an effective genetic algorithm that combines the capacity of genetic operators to conglomerate different solutions of the search space with the exploitation of the hill-climber. We advance a previous genetic-searching approach called GENCLUST, with the intervention of fast hill-climbing cycles of K-MEANS and obtain an algorithm that is faster than its predecessor and achieves clustering results of higher quality. We demonstrate this across a series of 18 commonly researched datasets. (C) 2017 Elsevier Ltd. All rights reserved.

EXPERT SYSTEMS WITH APPLICATIONS

页码：402-417|卷号：91

ISSN：0957-4174

收录类型

SSCI

发表日期

2018

学科领域

循证管理学

国家

澳大利亚

语种

英语

DOI

10.1016/j.eswa.2017.09.005

其他关键词

EXPRESSION DATA

EISSN

1873-6793

资助机构

Faculty of Business at Charles Sturt University

资助信息

The authors would like to thank the support provided by their host institutions and in particular the Compact Fund support of the Faculty of Business at Charles Sturt University.

被引频次（WOS）

被引更新日期

2022-01

来源机构

Charles Sturt University Griffith University

关键词

Clustering Genetic algorithm K-MEANS Data mining Cluster evaluation

Combining K-MEANS and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering

关键词分布