Academic Journals Database
Disseminating quality controlled scientific knowledge

Application of Rank Correlation, Clustering and Classification in Information Security

ADD TO MY LIST
 
Author(s): Gleb Beliakov | John Yearwood | Andrei Kelarev

Journal: Journal of Networks
ISSN 1796-2056

Volume: 7;
Issue: 6;
Start page: 935;
Date: 2012;
Original page

Keywords: consensus functions | clustering | classification | phishing websites

ABSTRACT
This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman--Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for the effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms.
Affiliate Program      Why do you need a reservation system?