A Robust Technique for Character String Extraction from Complex Document Images

被引：0

作者：

Chen, Yen-Lin ^{[1
]}

机构：

[1] Univ E Asia, Dept Comp Sci & Informat Engn, Taichung 41354, Taiwan

来源：

INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE | 2008年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A new technique for segmenting and extracting character strings from various real-life complex document images is proposed in this study. The proposed text extraction technique first decompose the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. Then a text extraction procedure is applied to the resultant planes to extract character strings with different characteristics in the corresponding planes. The document image is processed regionally and adaptively according to its local features, and thus detailed characteristics of extracted textual objects can be well-preserved, especially small characters with thin strokes. From the experimental results and comparisons to the existing technique, the proposed approach demonstrates its effectiveness and advantages on extracting character strings with various illuminations, sizes, and font styles from various types of complex document images.

引用

页码：1742 / 1750

页数：9

共 50 条

[41] Wavelet-based feature extraction from character images
Park, JH
Oh, IS
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 1092 - 1096
[42] Robust defection of skew in document images
Avanindra
Chaudhuri, S
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1997, 6 (02) : 344 - 349
[43] Probabilistic interpage analysis for article extraction from document images
Takasu, A
FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 932 - 935
[44] An automatic histogram detection and information extraction from document images
P. H. Anagha
A. Baskar
International Journal of Speech Technology, 2021, 24 : 77 - 85
[45] An automatic histogram detection and information extraction from document images
Anagha, P. H.
Baskar, A.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 77 - 85
[46] Text region extraction from quality degraded document images
Abirami, S.
Manjula, D.
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 519 - 527
[47] Text Extraction from Document Images using Edge Information
Grover, Sachin
Arora, Kushal
Mitra, Suman K.
2009 ANNUAL IEEE INDIA CONFERENCE (INDICON 2009), 2009, : 582 - +
[48] Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition
Michalak, Hubert
Okarma, Krzysztof
SENSORS, 2020, 20 (10)
[49] A Robust Algorithm for Text Extraction from Signage Images
Nimasha, W. H. A.
Ranathunge, L.
Jayawickrama, B. R.
Mahaliyanaarachchi, K. L.
Subhagya, L. G. B.
2018 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) CONFERENCE PROCEEDINGS, 2018, : 33 - 40
[50] ROBUST EXTRACTION OF STATISTICS FROM IMAGES OF MATERIAL FRAGMENTATION
Kamath, Chandrika
Hurricane, Omar A.
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2011, 11 (03) : 377 - 401

← 1 2 3 4 5 →