Text Line Extraction in Document Images

被引:0
|
作者
Wang, Liuan [1 ]
Fan, Wei [1 ]
Sun, Jun [1 ]
Naoi, Satshi [1 ]
Tanaka, Hiroshi [2 ]
机构
[1] Fujitsu Res & Dev Ctr CO LTD, Beijing, Peoples R China
[2] Fujitsu Labs Ltd, Kawasaki, Kanagawa, Japan
关键词
generic text line extraction; MSER; hierarchical edge reconstruction and cut; text line energy minimization; SCENE; REGION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line extraction in document images is an important prerequisite for many content based image understanding applications. In this paper, we propose an accurate and robust method for generic text line extraction, which can be applied on large categories of document images, diverse languages, and text lines with different orientations. Firstly, the candidate connected components are extracted from document image using Maximal Stable Extremal Region (MSER) with the noises filtered by Adaboost and Convolution Neural Network (CNN). Then, the coarse text lines are generated from hierarchical edges reconstruction and cut by local linearity of text lines in the document spanning tree. Finally, for accurate text line extraction, the cut mUlti-components are re-connected based on text line energy minimization in terms of text line consistency and the fitting error. Experimental results on multilingual test dataset demonstrate the effectiveness and robust of the proposed method, which yields higher performance compared with state-of-the-art methods.
引用
收藏
页码:191 / 195
页数:5
相关论文
共 50 条
  • [31] Text Line Detection in Document Images: Towards a Support System for the Blind
    Nassu, Bogdan Tomoyuki
    Minetto, Rodrigo
    Soares de Oliveira, Luiz Eduardo
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 638 - 642
  • [32] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    [J]. PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
  • [33] Text line extraction from handwritten document pages based on line contour estimation
    Sarkar, Ram
    Halder, Sougata
    Malakar, Samir
    Das, Nibaran
    Basu, Subhadip
    Nasipuri, Mita
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,
  • [34] SKEW CORRECTION AND LINE EXTRACTION IN BINARIZED PRINTED TEXT IMAGES
    Li, Wei
    Breier, Matthias
    Merhof, Dorit
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 472 - 476
  • [35] State Estimation in a Document Image and Its Application in Text Block Identification and Text Line Extraction
    Koo, Hyung Il
    Cho, Nam Ik
    [J]. COMPUTER VISION-ECCV 2010, PT II, 2010, 6312 : 421 - +
  • [36] Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques
    Sari, Toufik
    Kefali, Abderrahmane
    Bahi, Halima
    [J]. ADVANCES IN MULTIMEDIA, 2014, 2014 (2014)
  • [37] Text extraction method for historical Tibetan document images based on block projections
    Duan L.-J.
    Zhang X.-Q.
    Ma L.-L.
    Wu J.
    [J]. Optoelectronics Letters, 2017, 13 (6) : 457 - 461
  • [38] Novel data representation for text extraction from multispectral historical document images
    Hedjam, Rachid
    Cheriet, Mohamed
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 172 - 176
  • [39] Text extraction from gray scale document images using edge information
    Yuan, Q
    Tan, CL
    [J]. SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 302 - 306
  • [40] Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
    Hoai Nam Vu
    Tuan Anh Tran
    Seop, Na In
    Kim, Soo Hyung
    [J]. INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2016, 4 (01) : 11 - 21