An Effective Schema Extraction Algorithm on the Deep Web

被引:0
|
作者
Qiang, Bao-hua [1 ,2 ]
Xi, Jian-qing [1 ]
Qiang, Bao-hua [1 ,2 ]
Zhang, Long [2 ]
机构
[1] South China Univ Technol, Sch Engn & Comp Sci, Guangzhou 510641, Peoples R China
[2] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
关键词
Deep Web; schema extraction algorithm; query interface; grouping patterns;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
The Deep Web, a complex entity that contains information from a variety of source types, has gotten a lot of press in recent years. In order to unlock the vast Deep Web content, effective approaches to extract, index and search the query interfaces from dynamic web pages should be studied carefully. Based on our previously proposed grouping patterns and pre-clustering algorithm, this paper presents an effective schema extraction algorithm. Three metrics-(LCA) precision, (LCA) recall, and (LCA) F1 are employed to evaluate the performance of schema extraction algorithm. The experimental results indicate that our algorithm can improve the performance of schema extraction of query interfaces on the Deep Web obviously and avoid resulting in the inconsistencies between the subsets by pre-clustering algorithm and those by schema extraction algorithm.
引用
收藏
页码:10976 / +
页数:2
相关论文
共 50 条
  • [31] Extracting Result Schema Based on Query Instances in the Deep Web
    NIE Tiezheng
    [J]. Wuhan University Journal of Natural Sciences, 2007, (05) : 835 - 839
  • [33] Data extraction from the web based on pre-defined schema
    Xiaofeng Meng
    Hongjun Lu
    Haiyan Gang
    Mingzhe Gu
    [J]. Journal of Computer Science and Technology, 2002, 17 : 377 - 388
  • [34] Multi-category web object extraction based on relation schema
    Chen, Xiaowu
    Ma, Yongtao
    Zhao, Qinping
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2010, 25 (06): : 439 - 452
  • [35] Data extraction from the Web based on pre-defined schema
    Meng, XF
    Lu, HJ
    Wang, HY
    Gu, MZ
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (04) : 377 - 388
  • [36] Algorithm for XML Schema Extraction Based on Node Relative Path
    Cheng, Hong-Bin
    Sun, Xia
    [J]. 2009 WASE INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING, ICIE 2009, VOL II, 2009, : 146 - 148
  • [37] Review of Deep Web Data Extraction
    Li, Shenglin
    Chen, Chen
    Luo, Kaiwen
    Song, Bo
    [J]. 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1068 - 1070
  • [38] Ontology-assisted schema matching for deep web query interfaces
    Wang, Ying
    Zuo, Wanli
    Wang, Xin
    Zhang, Aiqi
    Peng, Tao
    [J]. Journal of Information and Computational Science, 2010, 7 (02): : 543 - 549
  • [39] Query Interface Schema Extracting from Deep Web using Ontology
    Sun, Yong
    Wang, Shang
    Li, Zhenyuan
    Liu, Chang
    Peng, Tao
    Qiu, Yuhang
    [J]. 2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076
  • [40] A novel alignment algorithm for effective web data extraction from singleton-item pages
    Yuliana, Oviliani Yenty
    Chang, Chia-Hui
    [J]. APPLIED INTELLIGENCE, 2018, 48 (11) : 4355 - 4370