Academic Journals Database
Disseminating quality controlled scientific knowledge

Classifying DNA repair genes by kernel-based support vector machines

Author(s): Hao Jiang* | Wai-Ki Ching

Journal: Bioinformation
ISSN 0973-2063

Volume: 7;
Issue: 5;
Start page: 257;
Date: 2011;
Original page

Human longevity is a complex phenotype that has a significant genetic predisposition. Like other biological processes, ageing process is governed through the regulation of signaling pathways and transcription factors. The DNA damage theory of ageing suggests that ageing is a consequence of un-repaired DNA damage accumulation. Intensive research has been carried out to elucidate the role of DNA repair systems in the ageing process. Decision Trees and Naive Bayesian Algorithm are two data-mining based classification methods for systematically analyzing data about human DNA repair genes. In this paper we develop a linearly combined kernel with Support Vector Machine (SVM) to analyze the ageing related data. The popular supervised learning algorithm enables better discrimination between ageing-related and non-ageing-related DNA repair genes. The linear combination of linear kernel and polynomial kernel of degree 3 in conjunction with SVM allows better classification accuracy in DNA repair gene data set. Compared to Decision Trees and Naive Bayesian Algorithm, SVM with the proposed kernel can achieve 65% AUC (Area Under ROC Curve) values, in contrast to 51.1% and 52.1% respectively. More importantly, we obtain 5 significant ageing-related genes selected through the training on the whole data set and they are PCNA, PARP, APEX1, MLH1 and XRCC6. Different from the two methods, we can identify another important gene PCNA in the pathways the two methods targeted, while they failed to. And two novel genes PARP, MLH1 are selected as well. The two genes might provide potential insights for biologists in ageing research. SVM is a powerful and robust classification algorithm that can yield higher predictive accuracies. The selection of proper kernel plays a more important role in fulfilling the classification task. The important genes identified not only can target critical pathways related to ageing but also detected genes that may reveal possible related ageing biomarkers.

Tango Rapperswil
Tango Rapperswil

     Save time & money - Smart Internet Solutions