GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources

被引:12
|
作者
Huang, Chih-Yuan [1 ]
Chang, Hao [2 ]
机构
[1] Natl Cent Univ, Ctr Space & Remote Sensing Res, Taoyuan 320, Taiwan
[2] Natl Cent Univ, Dept Civil Engn, Taoyuan 320, Taiwan
关键词
Geospatial Web; resource discovery; Web crawler; Open Geospatial Consortium; SYSTEM;
D O I
10.3390/ijgi5080136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. Thus, the "big geospatial data management" issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the WWW, users cannot find resources of their interests efficiently. While the WWW has Web search engines addressing web resource discovery issues, we envision that the geospatial Web (i.e., GeoWeb) also requires GeoWeb search engines. To realize a GeoWeb search engine, one of the first steps is to proactively discover GeoWeb resources on the WWW. Hence, in this study, we propose the GeoWeb Crawler, an extensible Web crawling framework that can find various types of GeoWeb resources, such as Open Geospatial Consortium (OGC) web services, Keyhole Markup Language (KML) and Environmental Systems Research Institute, Inc (ESRI) Shapefiles. In addition, we apply the distributed computing concept to promote the performance of the GeoWeb Crawler. The result shows that for 10 targeted resources types, the GeoWeb Crawler discovered 7351 geospatial services and 194,003 datasets. As a result, the proposed GeoWeb Crawler framework is proven to be extensible and scalable to provide a comprehensive index of GeoWeb.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Mercator: A scalable, extensible Web crawler
    Heydon A.
    Najork M.
    [J]. World Wide Web, 1999, 2 (4) : 219 - 229
  • [2] Development of a scalable web crawler
    Takano, H
    Kubo, N
    [J]. NEC RESEARCH & DEVELOPMENT, 1999, 40 (03): : 334 - 339
  • [3] An algorithm of deep web crawler's crawling
    Xiang Peisu
    Tian Ke
    Huang Qinzhen
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE INFORMATION COMPUTING AND AUTOMATION, VOLS 1-3, 2008, : 1259 - +
  • [4] An active crawler for discovering geospatial Web services and their distribution pattern - A case study of OGC Web Map Service
    Li, Wenwen
    Yang, Chaowei
    Yang, Chongjun
    [J]. INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2010, 24 (08) : 1127 - 1147
  • [5] Crawling and Cluster Hidden Web Using Crawler Framework and Fuzzy-KNN
    Rahayuda, I. Gede Surya
    Santiari, Ni Putu Linda
    [J]. 2017 5TH INTERNATIONAL CONFERENCE ON CYBER AND IT SERVICE MANAGEMENT (CITSM 2017), 2017, : 18 - 23
  • [6] An Extensible Simulation Framework for Diagnosing the Execution of the Distributed Geospatial Web Services
    Xiang, Binbin
    Li, Xu
    Zhang, Min
    Lu, Lifei
    Li, Fa
    Zhao, Binru
    Gui, Zhipeng
    [J]. 2015 23RD INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2015,
  • [7] A Framework of Deep Web Crawler
    Xiang Peisu
    Tian Ke
    Huang Qinzhen
    [J]. PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 5, 2008, : 582 - +
  • [8] Extensible Web Crawler - Towards Multimedia Material Analysis
    Turek, Wojciech
    Opalinski, Andrzej
    Kisiel-Dorohinicki, Marek
    [J]. MULTIMEDIA COMMUNICATIONS, SERVICES, AND SECURITY, 2011, 149 : 183 - 190
  • [9] A Focused Crawler for Web Feature Service and Web Map Service Discovering
    Alexandrino, Victor Macedo
    Comarela, Giovanni
    da Silva, Altigran Soares
    Lisboa-Filho, Jugurta
    [J]. WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS (W2GIS 2020), 2020, 12473 : 111 - 124
  • [10] TARANTULA A Scalable and Extensible Web Spider
    Saxena, Anshul
    Dubey, Keshav
    Dhurandher, Sanjay K.
    Woungang, Issac
    [J]. KMIS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE MANAGEMENT AND INFORMATION SHARING, 2009, : 167 - +