DWSpyder: A new schema extraction method for a deep web integration system

被引:0
|
作者
Saissi, Yasser [1 ]
Zellou, Ahmed [1 ]
Adri, Ali [1 ]
机构
[1] ENSIAS, Mohammed v University in Rabat, Rabat, Morocco
关键词
Websites - Clustering algorithms - Search engines - Integration;
D O I
10.1504/IJWET.2019.102872
中图分类号
学科分类号
摘要
The deep web is a huge part of the web that is not indexed by search engines. The deep web sources are accessible only through their associated access forms. We wish to use a web integration system to access the deep web sources and all of their information. To implement this web integration system, we need to know the schema description of each web source. The problem resolved in this paper is how to extract the schema describing an inaccessible deep web source. We propose our DWSpyder method as being able to extract the schema describing a deep web source despite its inaccessibility. The DWSpyder method starts with a static analysis of the deep web source access forms in order to extract the first elements of the associated schema description. The second step of our method is a dynamic analysis of these access forms using queries to enrich our schema description. Our DWSpyder method also uses a clustering algorithm to identify the possible values of deep web form fields with undefined sets of values. All of the information extracted is used by DWSpyder to generate automatically deep web source schema descriptions. © 2019 Inderscience Enterprises Ltd.
引用
收藏
页码:122 / 150
相关论文
共 50 条
  • [21] A Collaborative Schema Integration System
    Beynon-Davies P.
    Bonde L.
    Mcphee D.
    Jones C.B.
    [J]. Computer Supported Cooperative Work (CSCW), 1997, 6 (1): : 1 - 18
  • [22] A Deep Web Data Integration System for Job Search
    LIU Wei~1
    2. School of Software
    [J]. Wuhan University Journal of Natural Sciences, 2006, (05) : 1197 - 1201
  • [23] Sharing ontology schema information for Web service integration
    Huang, JS
    Gutiérrez, RLZ
    García, BM
    Huhns, MN
    [J]. FIFTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - PROCEEDINGS, 2005, : 1056 - 1062
  • [24] Abbreviation expansion in schema matching and web integration.
    Ratinov, L
    Gudes, E
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 485 - 490
  • [25] CCWrapper: Adaptive predefined schema guided web extraction
    Gao, Jun
    Yang, Dongqing
    Wang, Tengjiao
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 275 - 286
  • [26] Tag tree template for Web information and schema extraction
    Ji, Xiangwen
    Zeng, Jianping
    Zhang, Shiyong
    Wu, Chengrong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) : 8492 - 8498
  • [27] Schema Extraction and Integration of Heterogeneous XML Document Collections
    Janga, Prudhvi
    Davis, Karen C.
    [J]. MODEL AND DATA ENGINEERING, MEDI 2013, 2013, 8216 : 176 - 187
  • [28] AXIS: A XML schema integration system
    Sakamuri, B
    Madria, S
    Passi, K
    Chaudhry, E
    Mohania, M
    Bhowmick, S
    [J]. CONCEPTUAL MODELING - ER 2003, PROCEEDINGS, 2003, 2813 : 576 - 578
  • [29] A deep Web data integration system for book searching domain
    Zhong, Xin
    Fu, Yuchen
    Liu, Quan
    Wang, Yan
    Cui, Zhiming
    [J]. IITA 2007: WORKSHOP ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, PROCEEDINGS, 2007, : 66 - 69
  • [30] STUDY ON THE QUERY OPTIMIZATION FOR THE DEEP WEB DATA INTEGRATION SYSTEM
    Li Yanni
    Jiao Changzhe
    [J]. 2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 2, 2012, : 365 - 369