Academic Journals Database
Disseminating quality controlled scientific knowledge

A Comparative Study of Root -Based and Stem -Based Approaches for Measuring the Similarity Between Arabic Words for Arabic Text Mining Applications

ADD TO MY LIST
 
Author(s): Hanane FROUD | Abdelmonaim LACHKAR | Said ALAOUI OUATIK

Journal: Advanced Computing : an International Journal
ISSN 2229-726X

Volume: 3;
Issue: 6;
Start page: 55;
Date: 2012;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: Arabic Language | Latent Semantic Analysis (LSA) | Similarity Measures | Root and Light Stemmer

ABSTRACT
Representation of semantic information contained in the words is needed for any Arabic Text Miningapplications. More precisely, the purpose is to better take into account the semantic dependenciesbetween words expressed by the co-occurrence frequencies of these words. There have been manyproposals to compute similarities between words based on their distributions in contexts. In this paper,we compare and contrast the effect of two preprocessing techniques applied to Arabic corpus: Rootbased(Stemming), and Stem-based (Light Stemming) approaches for measuring the similarity betweenArabic words with the well known abstractive model -Latent Semantic Analysis (LSA)- with a widevariety of distance functions and similarity measures, such as the Euclidean Distance, Cosine Similarity,Jaccard Coefficient, and the Pearson Correlation Coefficient. The obtained results show that, on the onehand, the variety of the corpus produces more accurate results; on the other hand, the Stem-basedapproach outperformed the Root-based one because this latter affects the words meanings
RPA Switzerland

RPA Switzerland

Robotic process automation

    

Tango Jona
Tangokurs Rapperswil-Jona