Automatic Table Detection and Retention from Scanned Document Images via Analysis of Structural Information

被引:0
|
作者
Ranka, Varsha [1 ]
Patil, Shubham [1 ]
Patni, Shubham [1 ]
Raut, Tushar [1 ]
Mehrotra, Kapil [2 ]
Gupta, Manish Kumar [2 ]
机构
[1] PICT, Dept Comp Engn, Pune, Maharashtra, India
[2] Ctr Dev Adv Comp, Pune, Maharashtra, India
关键词
Optical Character Recognition; Table detection; Table Retention; Layout analysis; Document Analysis and Recognition;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.
引用
收藏
页码:244 / 249
页数:6
相关论文
共 50 条
  • [41] TableSegNet: a fully convolutional network for table detection and segmentation in document images
    Duc-Dung Nguyen
    International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 1 - 14
  • [42] Adaptive inverse halftoning for scanned document images through multiresolution and multiscale analysis
    Nishida, H
    PATTERN RECOGNITION, 2005, 38 (02) : 251 - 260
  • [43] Table Detection from Slide Images
    Che, Xiaoyin
    Yang, Haojin
    Meinel, Christoph
    IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 762 - 774
  • [44] Adaptive inverse halftoning for scanned document images through multiresolution and multiscale analysis
    Nishida, H
    DOCUMENT REGOGNITION AND RETRIEVAL XI, 2004, 5296 : 192 - 203
  • [45] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
  • [46] Visual Understanding of Complex Table Structures from Document Images
    Raja, Sachin
    Mondal, Ajoy
    Jawahar, C., V
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2543 - 2552
  • [47] Automatic name extraction from degraded document images
    Laurence Likforman-Sulem
    Pascal Vaillant
    Aliette de Bodard de la Jacopière
    Pattern Analysis and Applications, 2006, 9 : 211 - 227
  • [48] Automatic keyword extraction from historical document images
    Terasawa, K
    Nagasaki, T
    Kawashima, T
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 413 - 424
  • [49] Automatic generation of structured hyperdocuments from document images
    Lee, JY
    Park, JS
    Byun, H
    Moon, J
    Lee, SW
    PATTERN RECOGNITION, 2002, 35 (02) : 485 - 503
  • [50] Automatic name extraction from degraded document images
    Likforman-Sulem, Laurence
    Vaillant, Pascal
    de la Jacopiere, Aliette de Bodard
    PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 211 - 227