Academic Journals Database
Disseminating quality controlled scientific knowledge

Keyword Extraction Based Summarization of Categorized Kannada Text Documents

ADD TO MY LIST
 
Author(s): Jayashree.R | Srikanta Murthy.K | Sunny.K,

Journal: International Journal on Soft Computing
ISSN 2229-7103

Volume: 2;
Issue: 4;
Start page: 81;
Date: 2011;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: Summary | Keywords | GSS coefficient | Term Frequency (TF) | IDF (Inverse Document Frequency) and Rank of sentence

ABSTRACT
The internet has caused a humongous growth in the number of documents available online. Summaries ofdocuments can help find the right information and are particularly effective when the document base isvery large. Keywords are closely associated to a document as they reflect the document's content and actas indices for a given document. In this work, we present a method to produce extractive summaries ofdocuments in the Kannada language, given number of sentences as limitation. The algorithm extracts keywords from pre-categorized Kannada documents collected from online resources. We use two featureselection techniques for obtaining features from documents, then we combine scores obtained by GSS(Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later use these for summarization based on rank of thesentence. In the current implementation, a document from a given category is selected from our databaseand depending on the number of sentences given by the user, a summary is generated.
Save time & money - Smart Internet Solutions      Why do you need a reservation system?