A Robust Technique for Character String Extraction from Complex Document Images

被引:0
|
作者
Chen, Yen-Lin [1 ]
机构
[1] Univ E Asia, Dept Comp Sci & Informat Engn, Taichung 41354, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new technique for segmenting and extracting character strings from various real-life complex document images is proposed in this study. The proposed text extraction technique first decompose the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. Then a text extraction procedure is applied to the resultant planes to extract character strings with different characteristics in the corresponding planes. The document image is processed regionally and adaptively according to its local features, and thus detailed characteristics of extracted textual objects can be well-preserved, especially small characters with thin strokes. From the experimental results and comparisons to the existing technique, the proposed approach demonstrates its effectiveness and advantages on extracting character strings with various illuminations, sizes, and font styles from various types of complex document images.
引用
收藏
页码:1742 / 1750
页数:9
相关论文
共 50 条
  • [31] Word extraction from table regions in document images
    Jeong, CB
    Park, SC
    Son, HJ
    Kim, SH
    DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 2005, 3815 : 214 - 223
  • [32] Automatic name extraction from degraded document images
    Laurence Likforman-Sulem
    Pascal Vaillant
    Aliette de Bodard de la Jacopière
    Pattern Analysis and Applications, 2006, 9 : 211 - 227
  • [33] Automatic keyword extraction from historical document images
    Terasawa, K
    Nagasaki, T
    Kawashima, T
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 413 - 424
  • [34] Automatic name extraction from degraded document images
    Likforman-Sulem, Laurence
    Vaillant, Pascal
    de la Jacopiere, Aliette de Bodard
    PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 211 - 227
  • [35] A Robust Algorithm for Text Extraction from Images
    Chidiac, Najwa-Maria
    Damien, Pascal
    Yaacoub, Charles
    2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 493 - 497
  • [36] Extraction of character string region by a correlation method
    Miyamoto, Kazumasa
    Tamagawa, Mitsuaki
    Fujita, Ichiro
    Hayama, Yasunobu
    Eiho, Shigeru
    Systems and Computers in Japan, 1999, 30 (14): : 43 - 52
  • [37] A Character Degradation Model for Color Document Images
    Do Thi Luyen
    Carel, Elodie
    Ogier, Jean-Marc
    Burie, Jean-Christophe
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 806 - 810
  • [38] HUMAN-BASED CHARACTER STRING IMAGE RETRIEVAL FROM TEXTUAL IMAGES
    YOKOSAWA, K
    1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 1068 - 1069
  • [39] Character pattern extraction from documents with complex backgrounds
    Goto H.
    Aso H.
    International Journal on Document Analysis and Recognition, 2002, 4 (04) : 258 - 268
  • [40] Character extraction from natural scene images by hierarchical classifiers
    Yamaguchi, T
    Maruyama, M
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 687 - 690