Academic Journals Database
Disseminating quality controlled scientific knowledge

Best friends or just faking it? Corpus-based extraction of Slovene-Croatian translation equivalents and false friends

Author(s): Darja Fišer | Nikola Ljubešić

Journal: Slovenščina 2.0 : Empirične, Aplikativne in Interdisciplinarne Raziskave
ISSN 2335-2736

Volume: 1;
Issue: 1;
Start page: 50;
Date: 2013;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: automatic bilingual lexicon extraction | distributional semantics | closely related languages | cognates | false friends

In this paper we present a corpus-based approach to automatic extraction of translation equivalents and false friends for Slovene and Croatian, a pair of closely related languages. While taking advantage of the orthographic similarities between the two languages, the approach relies on a straightforward but powerful assumption of distributional semantics, which stipulates that words with a similar meaning tend to be used in similar contexts in both languages. On the one hand, this phenomenon enables us to quickly generate a Slovene-Croatian bilingual lexicon from minimal knowledge sources, the weakly comparable web corpora. On the other, it can also be used to identify the cognates that only seem similar on the surface but are in fact used to express different concepts in the two languages. The presented approach is language-independent and therefore attractive for natural language processing tasks that often lack the lexical resources and cannot afford to build them by hand, but is also useful in lexicography and language pedagogy where it can be used to highlight the lexical characteristics specific for a given language pair or domain.
RPA Switzerland

Robotic Process Automation Switzerland


Tango Rapperswil
Tango Rapperswil