DWSpyder: A new schema extraction method for a deep web integration system

被引:0
|
作者
Saissi, Yasser [1 ]
Zellou, Ahmed [1 ]
Adri, Ali [1 ]
机构
[1] ENSIAS, Mohammed v University in Rabat, Rabat, Morocco
关键词
Websites - Clustering algorithms - Search engines - Integration;
D O I
10.1504/IJWET.2019.102872
中图分类号
学科分类号
摘要
The deep web is a huge part of the web that is not indexed by search engines. The deep web sources are accessible only through their associated access forms. We wish to use a web integration system to access the deep web sources and all of their information. To implement this web integration system, we need to know the schema description of each web source. The problem resolved in this paper is how to extract the schema describing an inaccessible deep web source. We propose our DWSpyder method as being able to extract the schema describing a deep web source despite its inaccessibility. The DWSpyder method starts with a static analysis of the deep web source access forms in order to extract the first elements of the associated schema description. The second step of our method is a dynamic analysis of these access forms using queries to enrich our schema description. Our DWSpyder method also uses a clustering algorithm to identify the possible values of deep web form fields with undefined sets of values. All of the information extracted is used by DWSpyder to generate automatically deep web source schema descriptions. © 2019 Inderscience Enterprises Ltd.
引用
收藏
页码:122 / 150
相关论文
共 50 条
  • [41] Deep Web Data Extraction
    Hong, Jer Lang
    [J]. IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, : 3420 - 3427
  • [42] A new feature extraction method based on feature integration
    Liu Yi
    Zhang Caiming
    [J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 3, PROCEEDINGS, 2006, : 170 - +
  • [43] An XML Schema integration and query mechanism system
    Madria, Sanjay
    Passi, Kalpdrum
    Bhowmick, Sourav
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 65 (02) : 266 - 303
  • [44] Extraction Rule Language for Web Information Extraction and Integration
    Wei, Wu
    Shi, Shengsheng
    Liu, Yulong
    Wang, Haitao
    Yuan, Chunfeng
    Huang, Yihua
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 65 - +
  • [45] Extracting Result Schema Based on Query Instances in the Deep Web
    NIE Tiezheng
    [J]. Wuhan University Journal of Natural Sciences, 2007, (05) : 835 - 839
  • [47] A survey of Deep Web data integration
    Liu, Wei
    Meng, Xiao-Feng
    Meng, Wei-Yi
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2007, 30 (09): : 1475 - 1489
  • [48] Ontology-Based Deep Web Data Interface Schemas Integration Method
    Wang Rui
    Wang Nianbin
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY (EBISS 2010), 2010, : 182 - 185
  • [49] Multi-category web object extraction based on relation schema
    Chen, Xiaowu
    Ma, Yongtao
    Zhao, Qinping
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2010, 25 (06): : 439 - 452
  • [50] Data extraction from the web based on pre-defined schema
    Xiaofeng Meng
    Hongjun Lu
    Haiyan Gang
    Mingzhe Gu
    [J]. Journal of Computer Science and Technology, 2002, 17 : 377 - 388