Wrapping Web data into XML

被引:0
|
作者
Han, W [1 ]
Buttler, D [1 ]
Pu, C [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The vast majority of online information is part of the World Wide Web. In order to use this information for more than human browsing, web pages in HTML must be converted into a format meaningful to software programs. Wrappers have been a useful technique to convert HTML documents into semantically meaningful XML files. However, developing wrappers is slow and labor-intensive. Further, frequent changes on the HTML documents typically require frequent changes in the wrappers. This paper describes XWRAP Elite, a tool to automatically generate robust wrappers. XWRAP breaks down the conversion process into three steps. First, discover where the data is located in an HTML page and separating the data into individual objects. Second, decompose objects into data elements. Third, mark objects and elements in an output format. XWRAP Elite automates the first two steps and minimizes human involvement in marking output data. Our experience shows that XWRAP is able to create useful wrapper software for a wide variety of real world HTML documents.
引用
收藏
页码:33 / 38
页数:6
相关论文
共 50 条
  • [1] Wrapping web data into XML
    Han, Wei
    Buttler, David
    Pu, Calton
    SIGMOD Record (ACM Special Interest Group on Management of Data), 2001, 30 (03): : 33 - 38
  • [2] Wrapping web pages into XML documents
    Fu, T
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT: PROCEEDINGS, 2004, 3129 : 419 - 428
  • [3] Wrapping web data islands
    Corchuelo, Rafael
    Arjona, Jose L.
    Ruiz, David
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2008, 14 (11) : 1808 - 1810
  • [4] XML data model of web based on XML
    Department of Computer Science, Xiaogan University, Xiaogan 432000, China
    Journal of Computational Information Systems, 2008, 4 (01): : 323 - 328
  • [5] XML structures data for the Web
    Becker, M
    COMPUTERS IN PHYSICS, 1998, 12 (04): : 310 - 311
  • [6] Monitoring XML data on the Web
    Nguyen, B
    Abiteboul, S
    Cobena, G
    Preda, M
    SIGMOD RECORD, 2001, 30 (02) : 437 - 448
  • [7] XML Data Compression in Web Publishing
    Qiu, Ruiheng
    Hu, Wei
    Tang, Zhi
    Lu, Xiaoqing
    Zhang, Lei
    IMAGING AND PRINTING IN A WEB 2.0 WORLD III, 2012, 8302
  • [8] Translating XML web data into ontologies
    An, Y
    Mylopoulos, J
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2005: OTM 2005 WORKSHOPS, PROCEEDINGS, 2005, 3762 : 967 - 976
  • [9] Data on the Web: From relations to semistructured data and XML
    Wiley, DL
    ECONTENT, 2000, 23 (04) : 93 - 93
  • [10] Making XML Signatures Immune to XML Signature Wrapping Attacks
    Mainka, Christian
    Jensen, Meiko
    Lo Iacono, Luigi
    Schwenk, Joerg
    CLOUD COMPUTING AND SERVICES SCIENCE, CLOSER 2012, 2013, 367 : 151 - 167