Ontology-based HTML']HTML to XML conversion

被引:0
|
作者
Li, SJ [1 ]
Ou, WJ
Yu, JQ
机构
[1] Wuhan Univ, Sch Comp, Wuhan 430072, Peoples R China
[2] Chinese Acad Sci, Lab Comp Sci, Inst Software, Beijing 100080, Peoples R China
[3] Huazhong Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430074, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current wrapper approaches break down in extracting data from differently structured and frequently changing Web pages. To tackle this challenge, this paper defines domain-specific ontology, captures the semantic hierarchy in Web pages automatically by exploiting both structural information and common formatting information, and recognizes and extracts data by using ontology-based semantic matching without relying on page-specific formatting. It is adaptive to differently structured and frequently changing Web pages for a domain of interest.
引用
收藏
页码:888 / 893
页数:6
相关论文
共 50 条
  • [21] XML与HTML
    阮树银
    芜湖职业技术学院学报, 2004, (03) : 37 - 38
  • [22] Lurching toward Babel: HTML']HTML, CSS, and XML
    Korpela, J
    COMPUTER, 1998, 31 (07) : 103 - +
  • [23] Logical structure analysis: From HTML']HTML to XML
    Lee, Min-Hyung
    Kim, Yeon-Seok
    Lee, Kyong-Ho
    COMPUTER STANDARDS & INTERFACES, 2007, 29 (01) : 109 - 124
  • [24] A typed representation for HTML']HTML and XML documents in Haskell
    Thiemann, P
    JOURNAL OF FUNCTIONAL PROGRAMMING, 2002, 12 (4-5) : 435 - 468
  • [25] Wikipedia HTML']HTML Structure Analysis for Ontology Construction
    Zarrad, Rim
    Doggaz, Narjes
    Zagrouba, Ezzedine
    KNOWLEDGE ORGANIZATION, 2018, 45 (02): : 108 - 124
  • [26] Automatic translation of HTML']HTML laws and regulations into an XML repository
    Psaila, G
    Brugali, D
    ISAS/CITSA 2004: International Conference on Cybernetics and Information Technologies, Systems and Applications and 10th International Conference on Information Systems Analysis and Synthesis, Vol 1, Proceedings: COMMUNICATIONS, INFORMATION TECHNOLOGIES AND COMPUTING, 2004, : 252 - 256
  • [27] WebVigiL: User profile-based change detection for HTML']HTML/XML documents
    Pandrangi, N
    Jacob, J
    Sanka, A
    Chakravarthy, S
    NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 38 - 57
  • [28] Multipurpose Web publishing using HTML']HTML, XML, and CSS
    Lie, HW
    Saarela, J
    COMMUNICATIONS OF THE ACM, 1999, 42 (10) : 95 - 101
  • [29] A heuristic approach for converting HTML']HTML documents to XML documents
    Lim, SJ
    Ng, YK
    COMPUTATIONAL LOGIC - CL 2000, 2000, 1861 : 1182 - 1196
  • [30] RESEARCH AND REALIZATION OF CUSTOM FORM REALIZATION METHOD BASED ON HTML']HTML and XML TECHNOLOGY
    Wang Qianqian
    Wang Yong
    Chen Xin
    Wang Ying
    COMPUTER AND INFORMATION TECHNOLOGY, 2014, 519-520 : 231 - +