Ontology-based HTML']HTML to XML conversion

被引:0
|
作者
Li, SJ [1 ]
Ou, WJ
Yu, JQ
机构
[1] Wuhan Univ, Sch Comp, Wuhan 430072, Peoples R China
[2] Chinese Acad Sci, Lab Comp Sci, Inst Software, Beijing 100080, Peoples R China
[3] Huazhong Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430074, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current wrapper approaches break down in extracting data from differently structured and frequently changing Web pages. To tackle this challenge, this paper defines domain-specific ontology, captures the semantic hierarchy in Web pages automatically by exploiting both structural information and common formatting information, and recognizes and extracts data by using ontology-based semantic matching without relying on page-specific formatting. It is adaptive to differently structured and frequently changing Web pages for a domain of interest.
引用
收藏
页码:888 / 893
页数:6
相关论文
共 50 条
  • [41] An XML approach to semantically extract data from HTML']HTML tables
    Liu, JX
    Ao, ZY
    Park, HH
    Chen, YF
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 696 - 705
  • [42] HTML']HTML conversion tools: The good, the bad, and the ugly
    Laurent, S
    McKee, C
    45TH ANNUAL CONFERENCE ON IMAGINATION, INNOVATION AND COMMUNICATION, 1998, : 319 - 319
  • [43] Managing knowledge on the Web - Extracting ontology from HTML']HTML Web
    Du, Timon C.
    Li, Feng
    King, Irwin
    DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 319 - 331
  • [44] Information extraction from HTML']HTML tables base on domain ontology
    Hsiao, SL
    Chou, SC
    Chang, LP
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 70 - 76
  • [45] Ontology development for the semantic Web: An HTML']HTML form-based reverse engineering approach
    Benslimane, Sidi Mohamed
    Benslimane, Djamal
    Malki, Mimoun
    Maamar, Zakaria
    Thiran, Philippe
    Amaghar, Youssef
    Hacid, Mohand-Said
    JOURNAL OF WEB ENGINEERING, 2007, 6 (02): : 143 - 164
  • [46] Using XML metadata to enable the automatic generation and processing of HTML']HTML FORMS from XML documents
    Dubey, AK
    Chueh, HC
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2001, : 894 - 894
  • [47] Bridging the gap between SGML and HTML']HTML: The potential of XML for technical communicators
    Ray, DS
    Ray, EJ
    TECHNICAL COMMUNICATION, 1998, 45 (03) : 427 - 432
  • [48] A dataflow approach to efficient change detection of HTML']HTML/XML documents in WebVigiL
    Sanka, Anoop
    Chamakura, Shravan
    Chakravarthy, Sharma
    COMPUTER NETWORKS, 2006, 50 (10) : 1547 - 1563
  • [49] PDF to HTML']HTML conversion Having a usable web document
    Bhatti, M. Afzal
    Ahmad, Adeel
    2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2006, : 289 - +
  • [50] Efficient formalism-only parsing of XML/HTML']HTML using the calculus
    Jackson, QT
    ACM SIGPLAN NOTICES, 2003, 38 (02) : 29 - 35