Academic Journals Database
Disseminating quality controlled scientific knowledge

Managing word form variation of text retrieval in practice – Why language technology is not the only cure for better IR performance?

ADD TO MY LIST
 
Author(s): Dr. Kimmo Kettunen

Journal: Trends in Information Management
ISSN 0973-4163

Volume: 9;
Issue: 1;
Start page: 1;
Date: 2013;
Original page

Keywords: Information retrieval | Management of word form variation | Comparison of word form variation management methods | IR performance | Effectiveness | Language technology

ABSTRACT
Purpose: The article discusses on a general methodological level different methods that have been used for management of single key word form variation in information retrieval during the history of textual information retrieval. The paper offers the reader an overall practical guide for choosing between different methods to be used for different types of European languages. Methods being compared in the paper include stemming, lemmatization, truncation, syllabification, unsupervised morphological methods, character n-gramming and generation of inflected word forms.Methodology/Approach: Based on the empirical findings and results achieved by other researchers the paper discusses several pros and cons of different keyword variation management methods in a broader context than usually in IR, where only achieved effectiveness results are normally considered. The study proposes a list of five criteria for comparison of the conflation methods in general and offer a heuristics for choosing a suitable method for conflation of a specific language.Findings: Simpler character-based methods could be preferred in IR instead of very sophisticated linguistic methods. It is also suggested that for morphologically simple languages, such as English, any kind of keyword variation management may be futile, as the increase in IR effectiveness achieved may be very low. Morphologically more complex languages can be conflated with the simple methods quite effectively for present IR search engines.Keywords: Information retrieval; Management of word form variation; Comparison of word form variation management methods; IR performance; Effectiveness; Language technologyPaper Type: Meta-analysis
RPA Switzerland

Robotic Process Automation Switzerland

    

Tango Rapperswil
Tango Rapperswil