Academic Journals Database
Disseminating quality controlled scientific knowledge

A Comparative Study of Machine Learning Techniques in Classifying Full-Text Arabic Documents versus Summarized Documents

Author(s): Khalil Al-Hindi | Eman Al-Thwaib

Journal: World of Computer Science and Information Technology Journal
ISSN 2221-0741

Volume: 3;
Issue: 7;
Start page: 126;
Date: 2013;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: Text Classification | Text Summarization | Naïve Bayes | k-Nearest Neighbors.

Text classification (TC) can be described as the act of assigning text documents to predefined classes or categories. Its necessity comes from the large amount of electronic documents on the web. The classification accuracy is affected by the content of documents and the classification technique being used. Automatic text summarization is based on identifying the set of sentences that are most important for the overall understanding of document(s). The need for text summarization comes from the large amount of electronic documents and the need for saving processing time. In this research, an automatic text summarizer has been used to summarize documents. Two classification methods have been used to classify Arabic documents before and after applying the summarization, then the classification accuracy of classifying the full documents and summarized documents have been compared. Classification accuracy resulted from classifying full documents is close to that resulted from classifying summarized documents. Nevertheless, memory space required and run time for classifying summarized documents are less than the memory and time needed for classifying full documents.
RPA Switzerland

RPA Switzerland

Robotic process automation


Tango Rapperswil
Tango Rapperswil