Document transformation system from papers to XML data based on pivot XML document method

被引:0
|
作者
Ishitani, Y [1 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Saiwai Ku, Kawasaki, Kanagawa 2128582, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents.
引用
收藏
页码:250 / 255
页数:6
相关论文
共 50 条
  • [21] XML document retrieval system based on document structure and image content for digital museum
    Chang, JW
    Kim, YJ
    [J]. ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 107 - 111
  • [22] Design and Implementation of a Data Communication Component Based on XML Document
    Sun, Xue Bo
    Li, Ying Chun
    [J]. MANUFACTURING, DESIGN SCIENCE AND INFORMATION ENGINEERING, VOLS I AND II, 2015, : 1443 - 1452
  • [23] XML data representation in document image analysis
    Belaid, Abdel
    Falk, Ingrid
    Rangoni, Yves
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 78 - +
  • [24] Extending XML document projection for data integration
    Peng, XB
    Brazile, R
    Swigger, KM
    [J]. Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, 2005, : 138 - 143
  • [25] XRecursive: AStorage Method for XML Document Based on Relational Database
    Fakharaldien, M. A. Ibrahim
    Zain, Jasni Mohamed
    Sulaiman, Norrozila
    [J]. SOFTWARE ENGINEERING AND COMPUTER SYSTEMS, PT 2, 2011, 180 : 460 - 469
  • [26] Efficient XML Document Compressing Method Based on Internet of Things
    Lv Jiajia
    Wang Yuanli
    Zhong Yi
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 138 - 144
  • [27] A Hybrid Method to Evaluate Similarity of XML Document
    Dai, Yubiao
    Ren, Xueli
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 677 - 680
  • [28] Structure-based XML document replication scheme for XML database
    Jin, KS
    Lee, MY
    [J]. IKE '05: Proceedings of the 2005 International Conference on Information and Knowledge Engineering, 2005, : 172 - 177
  • [29] XML-based document model for networked manufacturing system
    Yu, Qing-Mei
    Yin, Chao-Wan
    Liu, Zhi-Gang
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2003, 9 (07): : 601 - 607
  • [30] A Method of Decomposing and Query XML Document Under the Circumstances of Uncertain Data
    Wang, Jianwei
    Hao, Zhongxiao
    [J]. PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 7, 2010, : 561 - 564