Automatic Table Detection and Retention from Scanned Document Images via Analysis of Structural Information

被引:0
|
作者
Ranka, Varsha [1 ]
Patil, Shubham [1 ]
Patni, Shubham [1 ]
Raut, Tushar [1 ]
Mehrotra, Kapil [2 ]
Gupta, Manish Kumar [2 ]
机构
[1] PICT, Dept Comp Engn, Pune, Maharashtra, India
[2] Ctr Dev Adv Comp, Pune, Maharashtra, India
关键词
Optical Character Recognition; Table detection; Table Retention; Layout analysis; Document Analysis and Recognition;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.
引用
收藏
页码:244 / 249
页数:6
相关论文
共 50 条
  • [1] Automatic table detection in document images
    Gatos, B
    Danatsas, D
    Pratikakis, I
    Perantonis, SJ
    PATTERN RECOGNITION AND DATA MINING, PT 1, PROCEEDINGS, 2005, 3686 : 609 - 618
  • [2] HybridTabNet: Towards Better Table Detection in Scanned Document Images
    Nazir, Danish
    Hashmi, Khurram Azeem
    Pagani, Alain
    Liwicki, Marcus
    Stricker, Didier
    Afzal, Muhammad Zeshan
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [3] An automatic histogram detection and information extraction from document images
    P. H. Anagha
    A. Baskar
    International Journal of Speech Technology, 2021, 24 : 77 - 85
  • [4] An automatic histogram detection and information extraction from document images
    Anagha, P. H.
    Baskar, A.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 77 - 85
  • [5] Border Detection of Document Images Scanned From Large Books
    Shamqoli, Maryam
    Khosravi, Hossein
    2013 8TH IRANIAN CONFERENCE ON MACHINE VISION & IMAGE PROCESSING (MVIP 2013), 2013, : 84 - 88
  • [6] Fast Document Area Detection for Scanned Images
    Kordecki, Andrzej
    ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041
  • [7] Automatic Abstraction of Combinational Logic Circuit from Scanned Document Page Images
    Datta, Ramanath
    Mandal, Sekhar
    Biswas, Samit
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2019, 29 (02) : 212 - 223
  • [8] Automatic Abstraction of Combinational Logic Circuit from Scanned Document Page Images
    Ramanath Datta
    Sekhar Mandal
    Samit Biswas
    Pattern Recognition and Image Analysis, 2019, 29 : 212 - 223
  • [9] A simple and effective table detection system from document images
    S. Mandal
    S. P. Chowdhury
    A. K. Das
    Bhabatosh Chanda
    International Journal of Document Analysis and Recognition (IJDAR), 2006, 8 : 172 - 182
  • [10] Simple and effective table detection system from document images
    Mandal, S.
    Chowdhury, S. P.
    Das, A. K.
    Chanda, Bhabatosh
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2006, 8 (2-3) : 172 - 182