Author(s): B. A. Ojokoh
Journal: International Journal of Computer Science Research and Application
ISSN 2012-9564
Volume: 02;
Issue: 03;
Start page: 02;
Date: 2012;
VIEW PDF
DOWNLOAD PDF
Original page
Keywords: Online news | Information extraction | RSS feeds | Title | HTML | Document Object Model | Search Engine
ABSTRACT
With the growth of the Internet and related tools, there has been an exponential growth of online resources. This tremendous growth has paradoxically made the task of finding, extracting and aggregating relevant information difficult. These days, finding and browsing news is one of the most important internet activities. In this paper, a hybrid method for online news article contents extraction is presented. The method combines RSS feeds and HTML Document Object Model (DOM) tree extraction. This approach is simple and effective at solving the problems associated with heterogeneous news layout and changing content found in many existing methods. The experimental results on some selected news sites show that the approach can extract news article contents automatically, effectively and consistently. The proposed method can also be adopted for other news sites.
Journal: International Journal of Computer Science Research and Application
ISSN 2012-9564
Volume: 02;
Issue: 03;
Start page: 02;
Date: 2012;
VIEW PDF


Keywords: Online news | Information extraction | RSS feeds | Title | HTML | Document Object Model | Search Engine
ABSTRACT
With the growth of the Internet and related tools, there has been an exponential growth of online resources. This tremendous growth has paradoxically made the task of finding, extracting and aggregating relevant information difficult. These days, finding and browsing news is one of the most important internet activities. In this paper, a hybrid method for online news article contents extraction is presented. The method combines RSS feeds and HTML Document Object Model (DOM) tree extraction. This approach is simple and effective at solving the problems associated with heterogeneous news layout and changing content found in many existing methods. The experimental results on some selected news sites show that the approach can extract news article contents automatically, effectively and consistently. The proposed method can also be adopted for other news sites.