Schema Extraction for Deep Web Query Interfaces Using Heuristics Rules

被引:0
|
作者
Chichang Jou
机构
[1] Tamkang University,Department of Information Management
来源
关键词
Deep web; Query interface; Schema extraction; XML; Heuristic rules; String similarity;
D O I
暂无
中图分类号
学科分类号
摘要
Along with the popularity of the world wide web, data volumes inside web databases have been increasing tremendously. These deep web contents, hidden behind the query interfaces, are of much better quality than those in the surface web. Internet users need to fill in query conditions in the HTML query interface and click the submit button to obtain deep web data. Many deep web contents related applications, like named entity attribute collection, topic-focused crawling, and heterogeneous data integration, are based on understanding schema of these query interfaces. The schema needs to cover mappings of input elements and labels, data types of valid input values, and range constraints of the input values. Additionally, to extract these hidden data, the schema needs to include many form submission related information, like cookies and action types. We design and implement a Heuristics-based deep web query interface Schema Extraction system (HSE). In HSE, texts surrounding elements are collected as candidate labels. We propose a string similarity function and use a dynamic similarity threshold to cleanse candidate labels. In HSE, elements, candidate labels, and new lines in the query interface are streamlined to produce its Interface Expression (IEXP). By combining the user’s view and the designer’s view, with the aid of semantic information, we build heuristic rules to extract schema from IEXP of query interfaces in the ICQ dataset. These rules are constructed through utilizing (1) the characteristics of labels and elements, and (2) the spatial, group, and range relationships of labels and elements. Supplemented with form submission related information, the extracted schemas are then stored in the XML format, so that they could be utilized in further applications, like schema matching and merging for federated query interface integration. The experimental results on the TEL-8 dataset illustrate that HSE produces effective performance.
引用
收藏
页码:163 / 174
页数:11
相关论文
共 50 条
  • [1] Schema Extraction for Deep Web Query Interfaces Using Heuristics Rules
    Jou, Chichang
    [J]. INFORMATION SYSTEMS FRONTIERS, 2019, 21 (01) : 163 - 174
  • [2] Correction to: Schema Extraction for Deep Web Query Interfaces Using Heuristics Rules
    Chichang Jou
    [J]. Information Systems Frontiers, 2020, 22 : 273 - 273
  • [3] Heuristics-Based Schema Extraction for Deep Web Query Interfaces
    Jou, Chichang
    Cheng, Yucheng
    [J]. 2017 IEEE 18TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI 2017), 2017, : 389 - 396
  • [4] Correction to: Schema Extraction for Deep Web Query Interfaces Using Heuristics Rules (vol 21, pg 163, 2019)
    Jou, Chichang
    [J]. INFORMATION SYSTEMS FRONTIERS, 2020, 22 (01) : 273 - 273
  • [5] Effective Schema Extraction of Query Interfaces on the Deep Web
    Qiang, Bao-hua
    Xi, Jian-qing
    Qiang, Bao-Hua
    Chen, Ling
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 291 - +
  • [6] Schema Extraction of Deep Web Query Interface
    Wang, Ying
    Peng, Tao
    Zuo, Wanli
    Zhu, Huifeng
    [J]. WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 391 - 395
  • [7] Schema matching across query interfaces on the Deep Web
    He, Zhongtian
    Hong, Jun
    Bell, David
    [J]. SHARING DATA, INFORMATION AND KNOWLEDGE, PROCEEDINGS, 2008, 5071 : 51 - 62
  • [8] The Discovery and Extraction of Query Interfaces Based on Deep Web
    Yang Daowen
    Liu Quan
    Cui Zhiming
    Fu Yuchen
    [J]. 2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 507 - 511
  • [9] Holistic Schema Matching for Web query interfaces
    Su, Weifeng
    Wang, Jiying
    Lochovsky, Frederick
    [J]. ADVANCES IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 3896 : 77 - 94
  • [10] Query Interface Schema Extracting from Deep Web using Ontology
    Sun, Yong
    Wang, Shang
    Li, Zhenyuan
    Liu, Chang
    Peng, Tao
    Qiu, Yuhang
    [J]. 2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076