Academic Journals Database
Disseminating quality controlled scientific knowledge

A framework for dynamic indexing from hidden web

ADD TO MY LIST
 
Author(s): Hasan Mahmud | Moumie Soulemane | Mohammad Rafiuzzaman

Journal: International Journal of Computer Science Issues
ISSN 1694-0784

Volume: 8;
Issue: 5;
Start page: 249;
Date: 2011;
Original page

Keywords: Dynamic web pages | crawler | hidden web | index | hadoop. | IJCSI

ABSTRACT
The proliferation of dynamic websites operating on databases requires generating web pages on-the-fly which is too sophisticated for most of the search engines to index. In an attempt to crawl the contents of dynamic web pages, weve tried to come up with a simple approach to index these huge amounts of dynamic contents hidden behind the search forms. Our key contribution in this paper is the design and implementation of a simple framework to index the dynamic web pages and the use of Hadoop MapReduce framework to update and maintain the index. In our approach, from an initial URL, our crawler downloads both the static and dynamic web pages, detects form interfaces, adaptively selects keywords to generate most promising search results, automatically fill-up search form interfaces, submits the dynamic URL and processes the result until some conditions are satisfied.
Save time & money - Smart Internet Solutions      Why do you need a reservation system?