Academic Journals Database
Disseminating quality controlled scientific knowledge

Streamed Sampling on Dynamic data as Support for Classification Model

ADD TO MY LIST
 
Author(s): Astried Silvanie | Taufik Djatna | Heru Sukoco

Journal: TELKOMNIKA
ISSN 1693-6930

Volume: 11;
Issue: 4;
Date: 2013;
Original page

Keywords: random sample | relative entropy | skewness | kullback liebler divergence | dynamic classification

ABSTRACT
Data mining process on dynamically changing data have several problems, such as unknown data size and skew of the data is always changing. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, Vitter’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id and priority. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between population and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence is a very good measure to maintain similar distribution between the population and the sample with interval from 0 to 0.0001. Sample results are always up to date on new transactions with similar skewnes. In purpose of classification task, decision tree model is improved significantly when the changing occurred.
Affiliate Program      Why do you need a reservation system?