Academic Journals Database
Disseminating quality controlled scientific knowledge

Design of Improved Web Crawler By Analysing Irrelevant Result

Author(s): Prashant Dahiwale | Dr. M.M. Raghuwanshi | Dr. Latesh Malik

Journal: International Journal of Computer Science and Mobile Computing
ISSN 2320-088X

Volume: 2;
Issue: 8;
Start page: 243;
Date: 2013;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: URL | focused crawler | classifier | relevance prediction | links | search engine | ranking

A key issue in designing a focused Web crawler is how to determine whether an unvisited URL isrelevant to the search topic. Effective relevance prediction can help avoid downloading and visiting manyirrelevant pages. In this module, we propose a new learning-based approach to improve relevance predictionin focused Web crawlers. For this study, we chose Naïve Bayesian as the base prediction model, whichhowever can be easily switched to a different prediction model. The performance of a focused crawlerdepends mostly on the richness of links in the specific topic being searched, and focused crawling usuallyrelies on a general web search engine for providing starting points.
Affiliate Program      Why do you need a reservation system?