Academic Journals Database
Disseminating quality controlled scientific knowledge

VSM Based Classification of Data Objects with Individual Treatment of Continuous and Discrete Attributes

Author(s): Komal Kumar Bhatia | Atul Srivastava | Veena Garg

Journal: International Journal of Computer Applications
ISSN 0975-8887

Volume: icwet;
Issue: 12;
Date: 2012;
Original page

Keywords: Information retrieval | Vector space Model | Classification | Continuous attributes | Discrete attributes | Classification technique

Classification is a technique, used in data mining, for identification of membership of a particular data object. In this paper we provide a technique of classification that is an enhancement of an existing method of information retrieval i.e. Vector Space Model. Vector space model is applied on text data and generally used to determine the relevance of query to the web pages in information retrieval. Data objects are categorized in two communities based on their attributes, one having discrete-valued attributes and second having continuous-valued attributes. In almost every previous attempt in this area has treated both of the communities of data objects separately. For scalability point of view of the classifier one type (discrete/continuous) is converted to the other (continuous/discrete).This conversion sometimes may hamper the accuracy. But in this paper continuous and discrete attributes are treated individually without tempering their representation. This paper emulates VSM to be used for classification in the same way it is used for determining query relevance in information retrieval. The results show that the enhanced model achieved very good results in performance and the setup time is also satisfactory for a large collection of data objects. This paper is organized as section 1 contains the basic terminology about classification and introduction of vector space model, section 2 contains the related work that has already been done in literature, section 3 contains model construction for classification i.e. simulation of existing vector space model for information retrieval and use of this model for classification of unseen data tuple, section 4 contains pseudo code for VSM classification. Section 5 shows experiment and results analysis through an example. Section 6 concludes the paper and throws light on future aspects.

Tango Jona
Tangokurs Rapperswil-Jona

     Save time & money - Smart Internet Solutions