Author(s): Andrei Popescu-Belis
Journal: Traitement Automatique des Langues
ISSN 1248-9433
Volume: 48;
Issue: 1;
Start page: 67;
Date: 2008;
VIEW PDF
DOWNLOAD PDF
Original page
Keywords: NLP systems | evaluation | ISO standards | quality characteristics | evaluation metrics
ABSTRACT
Research in natural language processing (NLP) has both scientific and technological dimensions. In both cases, it is necessary to evaluate the implemented systems in order to assess the success of a study. This article, grounded in the ISO framework for software evaluation, introduces a typology of NLP systems based on the role of language as input or output data, in order to analyze the central role of evaluation metrics at several stages of the NLP research process. The article focuses on the evaluation metrics that compare the response of a system to a set of correct responses. The analysis of several evaluation examples, in particular the case of machine translation systems, shows the importance of a coherent choice of metrics and of the joint use of several metrics. The influence of the context of use on the set of metrics and the case of interactive systems are discussed as a conclusion.
Journal: Traitement Automatique des Langues
ISSN 1248-9433
Volume: 48;
Issue: 1;
Start page: 67;
Date: 2008;
VIEW PDF


Keywords: NLP systems | evaluation | ISO standards | quality characteristics | evaluation metrics
ABSTRACT
Research in natural language processing (NLP) has both scientific and technological dimensions. In both cases, it is necessary to evaluate the implemented systems in order to assess the success of a study. This article, grounded in the ISO framework for software evaluation, introduces a typology of NLP systems based on the role of language as input or output data, in order to analyze the central role of evaluation metrics at several stages of the NLP research process. The article focuses on the evaluation metrics that compare the response of a system to a set of correct responses. The analysis of several evaluation examples, in particular the case of machine translation systems, shows the importance of a coherent choice of metrics and of the joint use of several metrics. The influence of the context of use on the set of metrics and the case of interactive systems are discussed as a conclusion.