Accelerated focused crawling through online relevance feedback

Chakrabarti, Soumen ; Punera, Kunal ; Mallela, Subramanyam (2002) Accelerated focused crawling through online relevance feedback In: WWW '02 Proceedings of the 11th International Conference on World Wide Web, May 7-11, Honolulu, Hawaii, USA.

[img]
Preview
PDF - Other
536kB

Official URL: http://dl.acm.org/citation.cfm?id=511466

Abstract

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to WWW '02 Proceedings of the 11th International Conference, Association for Computing Machinery.
Keywords:Focused Crawling; Document Object Model; Reinforcement Learning
ID Code:100103
Deposited On:12 Feb 2018 12:28
Last Modified:12 Feb 2018 12:28

Repository Staff Only: item control page