Web Pages Retrieval by Using Proposed Focused Crawler

Abstract

“Focused Crawler” is designed to visit a part of the web to collect documents that are related to only a particular topic. The objective of focused crawler is to identify good links that lead to target required documents, and to avoid branches that don't lead to the required topic. There is a number of motivations for designing focused crawler such as: fetching relevant data from the web with simplified data indexing, personalizing the human-computer interaction, making the system adaptive with each user, needing for a tool to change the searching strategy, keeping the freshness of the web pages and filtering the links to keep track focusing on the user’s preference. In this paper, we will explain two methods to retrieve web pages by using traditional crawler and proposed focused crawler. We make several experiments and it shows that proposed focused crawler is more efficient than traditional crawler in retrieving the desired web pages.