Academic Journals Database
Disseminating quality controlled scientific knowledge

An Empirical Evaluation of Density-Based Clustering Techniques

Author(s): Glory H. Shah | C. K. Bhensdadia | Amit P. Ganatra

Journal: International Journal of Soft Computing & Engineering
ISSN 2231-2307

Volume: 2;
Issue: 1;
Start page: 216;
Date: 2012;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: DBSCAN | OPTICS | DENCLUE | Spatial Data | Intra Cluster | Inter Cluster.

Emergence of modern techniques for scientific data collection has resulted in large scale accumulation of data pertaining to diverse fields. Conventional database querying methods are inadequate to extract useful information from huge data banks. Cluster analysis is one of the major data analysis methods. It is the art of detecting groups of similar objects in large data sets without having specified groups by means of explicit features. The problem of detecting clusters of points is challenging when the clusters are of different size, density and shape. The development of clustering algorithms has received a lot of attention in the last few years and many new clustering algorithms have been proposed. This paper gives a survey of density based clustering algorithms. DBSCAN [15] is a base algorithm for density based clustering techniques. One of the advantages of using these techniques is that method does not require the number of clusters to be given a prior nor do they make any kind of assumption concerning the density or the variance within the clusters that may exist in the data set. It can detect the clusters of different shapes and sizes from large amount of data which contains noise and outliers. OPTICS [14] on the other hand does not produce a clustering of a data set explicitly, but instead creates an augmented ordering of the database representing its density based clustering structure. This paper shows the comparison of two density based clustering methods i.e. DBSCAN [15] & OPTICS [14] based on essential parameters such as distance type, noise ratio as well as run time of simulations performed as well as number of clusters formed needed for a good clustering algorithm. We analyze the algorithms in terms of the parameters essential for creating meaningful clusters. Both the algorithms are tested using synthetic data sets for low as well as high dimensional data sets.
Affiliate Program     

Tango Rapperswil
Tango Rapperswil