Author(s): Somnuk Sinthupoun, | Ohm Sornil,
Journal: International Journal of Computer Science and Information Security
ISSN 1947-5500
Volume: 7;
Issue: 1;
Start page: 95;
Date: 2010;
Original page
Keywords: Thai Language | Element Discourse Unit | Rhetorical Structure Tree | Discourse Relation | Journal of Computer Science | IJCSIS | USA
ABSTRACT
A rhetorical structure tree (RS tree) is a representation of discourse relations among elementary discourse units (EDUs). A RS tree is very useful to many text processing tasks employing relationships among EDUs such as text understanding, summarization, and question-answering. Thai language with its unique linguistic characteristics requires a unique RS tree construction technique. This paper proposes an approach for Thai RS tree construction which consists of three major steps: EDU segmentation, Thai RS tree construction, and discourse relation (DR) identification. Two hidden markov models derived from grammatical rules are used to segment EDUs, a clustering technique with its similarity measure derived from Thai semantic rules is used to construct a Thai RS tree, and a decision tree whose features extracted from the rules is used to determine the DR between EDUs. The proposed technique is evaluated using three Thai corpora. The results show the Thai RS tree construction and the DR identification effectiveness of 94.90% and 82.81%, respectively.
Journal: International Journal of Computer Science and Information Security
ISSN 1947-5500
Volume: 7;
Issue: 1;
Start page: 95;
Date: 2010;
Original page
Keywords: Thai Language | Element Discourse Unit | Rhetorical Structure Tree | Discourse Relation | Journal of Computer Science | IJCSIS | USA
ABSTRACT
A rhetorical structure tree (RS tree) is a representation of discourse relations among elementary discourse units (EDUs). A RS tree is very useful to many text processing tasks employing relationships among EDUs such as text understanding, summarization, and question-answering. Thai language with its unique linguistic characteristics requires a unique RS tree construction technique. This paper proposes an approach for Thai RS tree construction which consists of three major steps: EDU segmentation, Thai RS tree construction, and discourse relation (DR) identification. Two hidden markov models derived from grammatical rules are used to segment EDUs, a clustering technique with its similarity measure derived from Thai semantic rules is used to construct a Thai RS tree, and a decision tree whose features extracted from the rules is used to determine the DR between EDUs. The proposed technique is evaluated using three Thai corpora. The results show the Thai RS tree construction and the DR identification effectiveness of 94.90% and 82.81%, respectively.