Academic Journals Database
Disseminating quality controlled scientific knowledge

Measuring Semantic Similarity between Words Using Web Documents

Author(s): Sheetal A. Takale | Sushma S. Nandgaonkar

Journal: International Journal of Advanced Computer Sciences and Applications
ISSN 2156-5570

Volume: 1;
Issue: 4;
Date: 2010;
Original page

Keywords: Semantic Similarity | Wikipedia | Web Search Engine | Natural Language Processing | Information Retrieval | Web Mining.

Semantic similarity measures play an important role inthe extraction of semantic relations. Semantic similarity measuresare widely used in Natural Language Processing (NLP) andInformation Retrieval (IR). The work proposed here uses webbasedmetrics to compute the semantic similarity between words orterms and also compares with the state-of-the-art. For a computerto decide the semantic similarity, it should understand thesemantics of the words. Computer being a syntactic machine, it cannot understand the semantics. So always an attempt is made torepresent the semantics as syntax. There are various methodsproposed to find the semantic similarity between words. Some ofthese methods have used the precompiled databases like WordNet,and Brown Corpus. Some are based on Web Search Engine. Theapproach presented here is altogether different from these methods.It makes use of snippets returned by the Wikipedia or anyencyclopedia such as Britannica Encyclopedia. The snippets arepreprocessed for stop word removal and stemming. For suffixremoval an algorithm by M. F. Porter is referred. Luhn’s Idea isused for extraction of significant words from the preprocessedsnippets. Similarity measures proposed here are based on the fivedifferent association measures in Information retrieval, namelysimple matching, Dice, Jaccard, Overlap, Cosine coefficient.Performance of these methods is evaluated using Miller andCharle’s benchmark dataset. It gives higher correlation value of0.80 than some of the existing methods
Save time & money - Smart Internet Solutions      Why do you need a reservation system?