A Robust Technique for Character String Extraction from Complex Document Images

被引:0
|
作者
Chen, Yen-Lin [1 ]
机构
[1] Univ E Asia, Dept Comp Sci & Informat Engn, Taichung 41354, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new technique for segmenting and extracting character strings from various real-life complex document images is proposed in this study. The proposed text extraction technique first decompose the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. Then a text extraction procedure is applied to the resultant planes to extract character strings with different characteristics in the corresponding planes. The document image is processed regionally and adaptively according to its local features, and thus detailed characteristics of extracted textual objects can be well-preserved, especially small characters with thin strokes. From the experimental results and comparisons to the existing technique, the proposed approach demonstrates its effectiveness and advantages on extracting character strings with various illuminations, sizes, and font styles from various types of complex document images.
引用
收藏
页码:1742 / 1750
页数:9
相关论文
共 50 条
  • [1] EXTRACTION OF BINARY CHARACTER GRAPHICS IMAGES FROM GRAYSCALE DOCUMENT IMAGES
    KAMEL, M
    ZHAO, A
    CVGIP-GRAPHICAL MODELS AND IMAGE PROCESSING, 1993, 55 (03): : 203 - 217
  • [2] Character String Extraction from Scene Images by Eliminating Non-character Elements
    Takagi, Noboru
    Chen, Jianjun
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 3685 - 3690
  • [3] Text extraction from complex document images using the multi-plane segmentation technique
    Chen, Yen-Lin
    Wu, Bing-Fei
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 3540 - +
  • [4] Robust Text Line, Word And Character Extraction From Telugu Document Image
    Koppula, Vijaya Kumar
    Atul, Negi
    Garain, Utpal
    2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 24 - +
  • [5] Robust Document Image Binarization Technique for Degraded Document Images
    Su, Bolan
    Lu, Shijian
    Tan, Chew Lim
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (04) : 1408 - 1417
  • [6] EXTRACTION OF INCLINED CHARACTER STRINGS FROM UNFORMED DOCUMENT IMAGES USING THE CONFIDENCE VALUE OF A CHARACTER RECOGNIZER
    TAKIZAWA, K
    ARITA, D
    MINOH, M
    IKEDA, K
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1994, E77D (07) : 839 - 845
  • [7] Character string extraction from color documents
    Hase, H
    Shinokawa, T
    Yoneda, M
    Suen, CY
    PATTERN RECOGNITION, 2001, 34 (07) : 1349 - 1365
  • [8] A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images
    Angadi, S. A.
    Kodabagi, M. M.
    2014 FIFTH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2014), 2014, : 42 - 49
  • [9] A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images
    Angadi, S. A.
    Kodabagi, M. M.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2014, 14 (1-2)
  • [10] Robust Character Segmentation and Recognition Schemes for Multilingual Indian Document Images
    Sahare, Parul
    Dhok, Sanjay B.
    IETE TECHNICAL REVIEW, 2019, 36 (02) : 209 - 222