Academic Journals Database
Disseminating quality controlled scientific knowledge

Automating XML Markup using Machine Learning Techniques

Author(s): Shazia Akhtar | Ronan Reilly | John Dunnion

Journal: Journal of Systemics, Cybernetics and Informatics
ISSN 1690-4532

Volume: 2;
Issue: 5;
Start page: 12;
Date: 2004;
Original page

Keywords: XML | Automatic Markup | Machine Learning | Self-Organizing Map | C5.0

In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algorithm learns and applies markup rules derived from the nearest SOM neighbours of an unmarked document. The system is designed to be adaptive so that it learns from errors in order to improve the markup of resulting document. Experiments shows that our system provides high accuracy and demonstrate that our approach is practical and feasible.
Why do you need a reservation system?      Save time & money - Smart Internet Solutions