Automatic Table Detection and Retention from Scanned Document Images via Analysis of Structural Information

被引：0

作者：

Ranka, Varsha ^{[1
]}

Patil, Shubham ^{[1
]}

Patni, Shubham ^{[1
]}

Raut, Tushar ^{[1
]}

Mehrotra, Kapil ^{[2
]}

Gupta, Manish Kumar ^{[2
]}

机构：

[1] PICT, Dept Comp Engn, Pune, Maharashtra, India

[2] Ctr Dev Adv Comp, Pune, Maharashtra, India

来源：

2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP) | 2017年

关键词：

Optical Character Recognition; Table detection; Table Retention; Layout analysis; Document Analysis and Recognition;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.

引用

页码：244 / 249

页数：6

共 50 条

[41] TableSegNet: a fully convolutional network for table detection and segmentation in document images
Duc-Dung Nguyen
International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 1 - 14
[42] Adaptive inverse halftoning for scanned document images through multiresolution and multiscale analysis
Nishida, H
PATTERN RECOGNITION, 2005, 38 (02) : 251 - 260
[43] Table Detection from Slide Images
Che, Xiaoyin
Yang, Haojin
Meinel, Christoph
IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 762 - 774
[44] Adaptive inverse halftoning for scanned document images through multiresolution and multiscale analysis
Nishida, H
DOCUMENT REGOGNITION AND RETRIEVAL XI, 2004, 5296 : 192 - 203
[45] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
Javed, Mohammed
Nagabhushan, P.
Chaudhuri, Bidyut B.
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
[46] Visual Understanding of Complex Table Structures from Document Images
Raja, Sachin
Mondal, Ajoy
Jawahar, C., V
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2543 - 2552
[47] Automatic name extraction from degraded document images
Laurence Likforman-Sulem
Pascal Vaillant
Aliette de Bodard de la Jacopière
Pattern Analysis and Applications, 2006, 9 : 211 - 227
[48] Automatic keyword extraction from historical document images
Terasawa, K
Nagasaki, T
Kawashima, T
DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 413 - 424
[49] Automatic generation of structured hyperdocuments from document images
Lee, JY
Park, JS
Byun, H
Moon, J
Lee, SW
PATTERN RECOGNITION, 2002, 35 (02) : 485 - 503
[50] Automatic name extraction from degraded document images
Likforman-Sulem, Laurence
Vaillant, Pascal
de la Jacopiere, Aliette de Bodard
PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 211 - 227

← 1 2 3 4 5 →