Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection

被引:5
|
作者
Zhang, Xiqun [1 ,2 ]
Duan, Lijuan [1 ,3 ]
Ma, Longlong [4 ]
Wu, Jian [4 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[2] Beijing Key Lab Trusted Comp, Beijing, Peoples R China
[3] Beijing Key Lab Integrat & Anal Large Scale Strea, Beijing, Peoples R China
[4] Chinese Acad Sci, Inst Software, Chinese Informat Proc Lab, Beijing, Peoples R China
来源
COMPUTER VISION, PT II | 2017年 / 772卷
关键词
Historical Tibetan document; Text extraction; Connected components; Corner point; LAYOUT ANALYSIS; SEGMENTATION;
D O I
10.1007/978-981-10-7302-1_45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a text extraction method for historical Tibetan document images. The task of text extraction is considered as text area detection and location problem. Firstly, the historical Tibetan document image is preprocessed to correct imbalanced illumination, tilt and noises, then get the binary image. Secondly, the regions of interest in historical Tibetan documents are divided into three categories using connected components. The images are divided equally into grids and the grids are filtered by the information of the categories of CCs and corner point density. The remaining grids are used to compute vertical and horizontal grid projections. Thirdly, by analyzing the projections, the approximate location of the text area can be detected. Finally, the text area is extracted accurately by correcting the bounding box of the approximate text area. Experiments on the dataset of historical Tibetan document images demonstrate the effectiveness of the proposed method.
引用
收藏
页码:545 / 555
页数:11
相关论文
共 50 条
  • [1] Corner detection and connected component extraction of JBIG-encoded document images
    Zhao, J
    Latifi, S
    Yao, DS
    Regentova, E
    [J]. MATHEMATICS AND APPLICATIONS OF DATA/IMAGE CODING, COMPRESSION, AND ENCRYPTION III, 2000, 4122 : 127 - 137
  • [2] Text extraction method for historical Tibetan document images based on block projections
    Duan L.-J.
    Zhang X.-Q.
    Ma L.-L.
    Wu J.
    [J]. Optoelectronics Letters, 2017, 13 (6) : 457 - 461
  • [3] Text extraction method for historical Tibetan document images based on block projections
    段立娟
    张西群
    马龙龙
    吴健
    [J]. Optoelectronics Letters, 2017, 13 (06) : 457 - 461
  • [4] Automated Latin Text Detection in Document Images and Natural Scene Images based on Connected Component Analysis
    Khan, Muhammad Jaleed
    Said, Naina
    Khan, Aqsa
    Rehman, Naila
    Khurshid, Khurram
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
  • [5] Research on Text Line Segmentation of Historical Tibetan Documents Based on the Connected Component Analysis
    Wang, Yiqun
    Wang, Weilan
    Li, Zhenjiang
    Han, Yuehui
    Wang, Xiaojuan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 74 - 87
  • [6] Text line extraction for historical document images
    Saabni, Raid
    Asi, Abedelkadir
    El-Sana, Jihad
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 23 - 33
  • [7] Text extraction in document images: highlight on using corner points
    Yadav, Vikas
    Ragot, Nicolas
    [J]. PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 281 - 286
  • [8] VESSELNESS FOR TEXT DETECTION IN HISTORICAL DOCUMENT IMAGES
    Hofmann, Simon
    Gropp, Martin
    Bernecker, David
    Pollin, Christopher
    Maier, Andreas
    Christlein, Vincent
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3259 - 3263
  • [9] Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis
    Khandelwal, Abhishek
    Choudhury, Pritha
    Sarkar, Ram
    Basu, Subhadip
    Nasipuri, Mita
    Das, Nibaran
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 369 - +
  • [10] A Combined Edge and Connected Component Based Approach for Kannada Text Detection in Images
    Siddiqua, Shahzia
    Naveena, C.
    Manvi, Sunil Kumar
    [J]. 2017 INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ELECTRONICS AND COMMUNICATION TECHNOLOGY (ICRAECT), 2017, : 121 - 125