Automatic Table Detection and Retention from Scanned Document Images via Analysis of Structural Information

被引:0
|
作者
Ranka, Varsha [1 ]
Patil, Shubham [1 ]
Patni, Shubham [1 ]
Raut, Tushar [1 ]
Mehrotra, Kapil [2 ]
Gupta, Manish Kumar [2 ]
机构
[1] PICT, Dept Comp Engn, Pune, Maharashtra, India
[2] Ctr Dev Adv Comp, Pune, Maharashtra, India
关键词
Optical Character Recognition; Table detection; Table Retention; Layout analysis; Document Analysis and Recognition;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.
引用
收藏
页码:244 / 249
页数:6
相关论文
共 50 条
  • [31] Digital Line Segment Detection for Table Reconstruction in Document Images
    Phuc Ngo
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 211 - 224
  • [32] HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
    Kasem, Mahmoud SalahEldin
    Mahmoud, Mohamed
    Yagoub, Bilel
    Senussi, Mostafa Farouk
    Abdalla, Mahmoud
    Kang, Hyun-Soo
    MATHEMATICS, 2025, 13 (02)
  • [33] Table detection in business document images by message passing networks
    Riba, Pau
    Goldmann, Lutz
    Terrades, Oriol Ramos
    Rusticus, Diede
    Fornés, Alicia
    Lladós, Josep
    Pattern Recognition, 2022, 127
  • [34] Table detection in business document images by message passing networks
    Riba, Pau
    Goldmann, Lutz
    Terrades, Oriol Ramos
    Rusticus, Diede
    Fornes, Alicia
    Llados, Josep
    PATTERN RECOGNITION, 2022, 127
  • [35] Automated detection and segmentation of table of contents page and index pages from document images
    Mandal, S
    Chowdhury, SP
    Das, AK
    Chanda, B
    12TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2003, : 213 - 218
  • [36] Word extraction from table regions in document images
    Jeong, CB
    Park, SC
    Son, HJ
    Kim, SH
    DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 2005, 3815 : 214 - 223
  • [37] Information Extraction from Document Images via FCA based Template Detection and Knowledge Graph Rule Induction
    Rastogi, Mouli
    Ali, Syed Afshan
    Rawat, Mrinal
    Vig, Lovekesh
    Agarwal, Puneet
    Shroff, Gautam
    Srinivasan, Ashwin
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2377 - 2385
  • [38] Detection of Table Structure and Content Extraction From Scanned Documents
    Deivalakshmi, S.
    Chaitanya, K.
    Palanisamy, P.
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2014,
  • [39] Information extraction from scanned invoice images using text analysis and layout features
    Ha, H. T.
    Horak, A.
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 102
  • [40] TableSegNet: a fully convolutional network for table detection and segmentation in document images
    Nguyen, Duc-Dung
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (01) : 1 - 14