Document transformation system from papers to XML data based on pivot XML document method

被引:0
|
作者
Ishitani, Y [1 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Saiwai Ku, Kawasaki, Kanagawa 2128582, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents.
引用
收藏
页码:250 / 255
页数:6
相关论文
共 50 条
  • [1] XTC -: The XML transformation coordinator for XML document transformation technologies
    Foetsch, Daniel
    Speck, Andreas
    [J]. SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 507 - +
  • [2] A Document Object Modeling Method to Retrieve Data from a Very Large XML Document
    Kim, Seung Min
    Yoo, Suk I.
    Hong, Eunji
    [J]. DOCENG'07: PROCEEDINGS OF THE 2007 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2007, : 59 - 68
  • [3] A Method of XML Twig Query Processing based on XML Document Schema
    Yu, Yi
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC, CONTROL AND AUTOMATION ENGINEERING (MECAE 2017), 2017, 61 : 172 - 175
  • [4] Integrating document and data retrieval based on XML
    Jan-Marco Bremer
    Michael Gertz
    [J]. The VLDB Journal, 2006, 15 : 53 - 83
  • [5] Integrating document and data retrieval based on XML
    Bremer, JM
    Gertz, M
    [J]. VLDB JOURNAL, 2006, 15 (01): : 53 - 83
  • [6] Authorization Translation for XML Document Transformation
    Somchai Chatvichienchai
    Mizuho Iwaihara
    Yahiko Kambayashi
    [J]. World Wide Web, 2004, 7 : 111 - 138
  • [7] Authorization translation for XML document transformation
    Chatvichienchai, S
    Iwaihara, M
    Kambayashi, Y
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2004, 7 (01): : 111 - 138
  • [8] Mapping Bitemporal XML Data Model to XML Document
    Tang, Na
    Tang, Yong
    [J]. COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN IV, 2008, 5236 : 342 - 352
  • [9] Validation of XML document updates based on XML schema in XML databases
    Kim, SK
    Lee, M
    Lee, KC
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, 2736 : 98 - 108
  • [10] Consistent data for inconsistent XML document
    Tan, Zijing
    Zhang, Zijun
    Wang, Wei
    Shi, Baile
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (9-10) : 947 - 959