Academic Journals Database
Disseminating quality controlled scientific knowledge

Feature Selection via Correlation Coefficient Clustering

Author(s): Hui-Huang Hsu | Cheng-Wei Hsieh

Journal: Journal of Software
ISSN 1796-217X

Volume: 5;
Issue: 12;
Start page: 1371;
Date: 2010;
Original page

Keywords: Feature Selection | Clustering | Correlation Coefficient | Support Vector Machines (SVMs) | Machine Learning | Classification

Feature selection is a fundamental problem in machine learning and data mining. How to choose the most problem-related features from a set of collected features is essential. In this paper, a novel method using correlation coefficient clustering in removing similar/redundant features is proposed. The collected features are grouped into clusters by measuring their correlation coefficient values. The most class-dependent feature in each cluster is retained while others in the same cluster are removed. Thus, the most class-related and mutually unrelated features are identified. The proposed method was applied to two datasets: the disordered protein dataset and the Arrhythmia (ARR) dataset. The experimental results show that the method is superior to other feature selection methods in speed and/or accuracy. Detail discussions are given in the paper.
Save time & money - Smart Internet Solutions      Why do you need a reservation system?