Free and Latest article publishing for websites and ezines!


Research on Focused Hidden Web Crawler

As the rapid development of World Wide Web, there is t tremendous information "hiddened" in Hidden Web , and it s capacity is increasing rapidly. The information can only be accessed by t he query interfaces provided by Web database. The data in Hidden Web are obtained in the form of dynamic Web pages when users send a query. Due to the poor structure of web pages and the instability and large scale of Hidden Web, it is a very challenging task to integrate the abundant information automatically and use it effectively.Because of this hidden feature, hidden web is hard to crawl. It becomes a new direction in the field of information retrieval. In this paper the cause of formation and the feature of Hidden Web have been introduced. It also showed the similarities and differences of Hidden web crawler and traditional crawler by contrast. Through that, it analyzed the key technique of designing a Hidden Web crawler, and the step when a crawler crawling the web page.In this paper a new method of focused hidden web information retrieval is proposed. It presents a generic operational model of the hidden web information retrieval and describes the key techniques. It also introduces a focused crawling technique and a new heuristic query selection algorithm which designed by this paper. Based on those techniques, the crawling is more efficient and more accurate. The result of experiment indicates that the new solution is much better than the old one.

Recommended Articles from the Networks Category:

Most Viewed Articles in the Networks Category:

  1. Design and Realization of Task Scheduling Algorithm in Grid Environment
  2. Research on Trust Model in P2P Based on Improved Chord Protocol
  3. Design and Implement of VPN with Dynamic Password
  4. The Research of Task Scheduling in Computational Grid Based on DCG3A
  5. Research on Scheduling Disciplines with Self-Similar Traffic Input
  6. Research on Extension of Network Management Functions and System Realization
  7. Research of Incentive Model in P2P Network
  8. Research on Grid Resource Scheduling Model with Three-level and Algorithm
  9. Research on the Replica Selection Strategies in Spatial Information Grid
  10. Research on IP Multicast Access to SUPANET Multicast Management


© 2004-2009 Information-Technology-Articles.com - All Rights Reserved Worldwide.