Academic Journals Database
Disseminating quality controlled scientific knowledge

Named Entity Identifier for Malayalam Using Linguistic Principles Employing Statistical Methods

Author(s): Bindu.M.S | Sumam Mary Idicula

Journal: International Journal of Computer Science Issues
ISSN 1694-0784

Volume: 8;
Issue: 5;
Start page: 185;
Date: 2011;
Original page

Keywords: Malayalam compound word | Finite state Transducer | Extended Conditional Random Field | Feature vector. | IJCSI

Natural language processing (NLP) began as a branch of Artificial Intelligence is a field of computer science and linguistics and is concerned with interaction between human language and computer. Major tasks of NLP such as Machine Translation (MT), Information Retrieval (IR) and Summarization require extensive knowledge of the language for the effective identification of semantic information in the text. Meaning or semantics of a text is mainly decided by the named entities which are the role carrying agents in a text. The system presented here is a Named Entity (NE) Identifier created using Statistical methods based on linguistic grammar principles. Malayalam NER is a difficult task as each word of named entity has no specific feature such as Capitalization feature in English. NERs in other languages are not suitable for Malayalam language since its morphology, syntax and lexical semantics is different from them. For testing this system, documents from well known Malayalam news papers and magazines containing passages from five different fields are selected. Experimental results show that the average precision recall and F-measure values are 85.52%, 86.32% and 85.61% respectively.

Tango Rapperswil
Tango Rapperswil

     Save time & money - Smart Internet Solutions