Text Line Extraction in Document Images

被引:0
|
作者
Wang, Liuan [1 ]
Fan, Wei [1 ]
Sun, Jun [1 ]
Naoi, Satshi [1 ]
Tanaka, Hiroshi [2 ]
机构
[1] Fujitsu Res & Dev Ctr CO LTD, Beijing, Peoples R China
[2] Fujitsu Labs Ltd, Kawasaki, Kanagawa, Japan
关键词
generic text line extraction; MSER; hierarchical edge reconstruction and cut; text line energy minimization; SCENE; REGION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line extraction in document images is an important prerequisite for many content based image understanding applications. In this paper, we propose an accurate and robust method for generic text line extraction, which can be applied on large categories of document images, diverse languages, and text lines with different orientations. Firstly, the candidate connected components are extracted from document image using Maximal Stable Extremal Region (MSER) with the noises filtered by Adaboost and Convolution Neural Network (CNN). Then, the coarse text lines are generated from hierarchical edges reconstruction and cut by local linearity of text lines in the document spanning tree. Finally, for accurate text line extraction, the cut mUlti-components are re-connected based on text line energy minimization in terms of text line consistency and the fitting error. Experimental results on multilingual test dataset demonstrate the effectiveness and robust of the proposed method, which yields higher performance compared with state-of-the-art methods.
引用
收藏
页码:191 / 195
页数:5
相关论文
共 50 条
  • [1] Text line extraction for historical document images
    Saabni, Raid
    Asi, Abedelkadir
    El-Sana, Jihad
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 23 - 33
  • [2] FAST TEXT LINE EXTRACTION IN DOCUMENT IMAGES
    Ha, Seong Jong
    Jin, Bora
    Cho, Nam Ik
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 797 - 800
  • [3] A Hybrid Method for Text Line Extraction in Handwritten Document Images
    Kiumarsi, Ehsan
    Alaei, Alireza
    [J]. PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 241 - 246
  • [4] Text Line Extraction of Curved Document Images Using Hybrid Metric
    Huang, Zuming
    Gu, Jie
    Meng, Gaofeng
    Pan, Chunhong
    [J]. PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 251 - 255
  • [5] Text Line Extraction for Historical Document Images using Steerable Directional Filters
    Alaql, Omar
    Lu, Cheng Chang
    [J]. 2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 312 - 317
  • [6] Text-line extraction from handwritten document images using GAN
    Kundu, Soumyadeep
    Paul, Sayantan
    Bera, Suman Kumar
    Abraham, Ajith
    Sarkar, Ram
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
  • [7] Localization and extraction of text in Telugu document images
    Negi, A
    Kasinadhuni, N
    [J]. IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 749 - 752
  • [8] Review of Text Extraction Algorithms for Scene-text and Document Images
    Sahare, Parul
    Dhok, Sanjay B.
    [J]. IETE TECHNICAL REVIEW, 2017, 34 (02) : 144 - 164
  • [9] Segmentation and Text extraction from Document Images: Survey
    Mukarambi, Gururaj
    Gaikwad, Hema
    Dhandra, B., V
    [J]. 2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [10] Localization, extraction and recognition of text in Telugu document images
    Negi, A
    Shanker, KN
    Chereddi, CK
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 1193 - 1197