Text Line Extraction in Document Images

被引:0
|
作者
Wang, Liuan [1 ]
Fan, Wei [1 ]
Sun, Jun [1 ]
Naoi, Satshi [1 ]
Tanaka, Hiroshi [2 ]
机构
[1] Fujitsu Res & Dev Ctr CO LTD, Beijing, Peoples R China
[2] Fujitsu Labs Ltd, Kawasaki, Kanagawa, Japan
关键词
generic text line extraction; MSER; hierarchical edge reconstruction and cut; text line energy minimization; SCENE; REGION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line extraction in document images is an important prerequisite for many content based image understanding applications. In this paper, we propose an accurate and robust method for generic text line extraction, which can be applied on large categories of document images, diverse languages, and text lines with different orientations. Firstly, the candidate connected components are extracted from document image using Maximal Stable Extremal Region (MSER) with the noises filtered by Adaboost and Convolution Neural Network (CNN). Then, the coarse text lines are generated from hierarchical edges reconstruction and cut by local linearity of text lines in the document spanning tree. Finally, for accurate text line extraction, the cut mUlti-components are re-connected based on text line energy minimization in terms of text line consistency and the fitting error. Experimental results on multilingual test dataset demonstrate the effectiveness and robust of the proposed method, which yields higher performance compared with state-of-the-art methods.
引用
收藏
页码:191 / 195
页数:5
相关论文
共 50 条
  • [21] Extraction of text words in document images based on a statistical characterization
    Chen, S
    Haralick, RM
    Phillips, IT
    [J]. JOURNAL OF ELECTRONIC IMAGING, 1996, 5 (01) : 25 - 36
  • [22] AUTOMATIC TEXT EXTRACTION, REMOVAL AND INPAINTING OF COMPLEX DOCUMENT IMAGES
    Chen, Yen-Lin
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (1A): : 303 - 327
  • [23] Text region extraction from quality degraded document images
    Abirami, S.
    Manjula, D.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 519 - 527
  • [24] Text Extraction from Document Images using Edge Information
    Grover, Sachin
    Arora, Kushal
    Mitra, Suman K.
    [J]. 2009 ANNUAL IEEE INDIA CONFERENCE (INDICON 2009), 2009, : 582 - +
  • [25] An effective method for text line segmentation in historical document images
    Tien-Nam Nguyen
    Burie, Jean-Christophe
    Thi-Lan Le
    Schweyer, Anne-Valerie
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1593 - 1599
  • [26] DENSE PREDICTION FOR TEXT LINE SEGMENTATION IN HANDWRITTEN DOCUMENT IMAGES
    Quang Nhat Vo
    Lee, GueeSang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3264 - 3268
  • [27] Text region extraction and text segmentation on camera-captured document style images
    Song, YJ
    Kim, KC
    Choi, YW
    Byun, HR
    Kim, SH
    Chi, SY
    Jang, DK
    Chung, YK
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 172 - 176
  • [28] Gabor filter based text extraction from digital document images
    Qiao, Yu-Long
    Li, Meng
    Lu, Zhe-Ming
    Sun, Sheng-He
    [J]. IIH-MSP: 2006 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS, 2006, : 297 - +
  • [29] Learning to Detect Tables in Document Images using Line and Text Information
    Thong Huynh-Van
    Trinh Le Ba Khanh
    Tuan Anh Tran
    Khuong Nguyen-An
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    [J]. 2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING (ICMLSC 2018), 2015, : 151 - 155
  • [30] Text Line Segmentation in Handwritten Document Images Using Tensor Voting
    Toan Dinh Nguyen
    Gueesang Lee
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2011, E94A (11) : 2434 - 2441