Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering

被引:0
|
作者
Hoai Nam Vu [1 ]
Tuan Anh Tran [1 ]
Na, In Seop [1 ]
Kim, Soo Hyung [1 ]
机构
[1] Chonnam Natl Univ, Dept Comp Sci, 77 Yongbong Ro, Kwangju 500757, South Korea
关键词
Multilevel; K-mean; Connected Component;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 50 条
  • [41] Extraction of Vegetation Using Modified K-Means Clustering
    Kadu, Sujata R.
    Hogade, Balaji G.
    Rizvi, Imdad
    Yadav, Sarika
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS, ICTIS 2018, VOL 2, 2019, 107 : 391 - 398
  • [42] Detection of Regions of Interest in Retinal Images Using Artificial Neural Networks and K-means Clustering
    Caramihale, Traian
    Popescu, Dan
    Ichim, Loretta
    2016 22ND INTERNATIONAL CONFERENCE ON APPLIED ELECTROMAGNETICS AND COMMUNICATIONS (ICECOM), 2016,
  • [43] On the performance of feature weighting K-means for text subspace clustering
    Jing, LP
    Ng, MK
    Xu, J
    Huang, JZX
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 502 - 512
  • [44] An Application of K-Means Clustering for Improving Video Text Detection
    Aradhya, V. N. Manjunath
    Pavithra, M. S.
    INTELLIGENT INFORMATICS, 2013, 182 : 41 - +
  • [45] Automatic Parameter Tuning of K-Means Algorithm for Document Binarization
    Gattal, Abdeljalil
    Abbas, Faycel
    Laouar, Mohamed Ridda
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND NEW TECHNOLOGIES (ICSENT '18), 2018,
  • [46] AN APPROACH FOR TEXT CLUSTERING USING MODIFIED K-MEANS ALGORITHM
    Rose, J. Dafni
    Mukherjee, Saswati
    4TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGY AND ENGINEERING (ICSTE 2012), 2012, : 243 - 247
  • [47] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
  • [48] Design and application of a text clustering algorithm based on parallelized k-means clustering
    Wang H.
    Zhou C.
    Li L.
    Revue d'Intelligence Artificielle, 2019, 33 (06) : 453 - 460
  • [49] AUTOMATIC TEXT EXTRACTION, REMOVAL AND INPAINTING OF COMPLEX DOCUMENT IMAGES
    Chen, Yen-Lin
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (1A): : 303 - 327
  • [50] An Improved K-Means Clustering for Segmentation of Pancreatic Tumor from CT Images
    Roy, R. Reena
    Mala, G. S. Anandha
    IETE JOURNAL OF RESEARCH, 2023, 69 (07) : 3966 - 3973