Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering

被引:0
|
作者
Hoai Nam Vu [1 ]
Tuan Anh Tran [1 ]
Na, In Seop [1 ]
Kim, Soo Hyung [1 ]
机构
[1] Chonnam Natl Univ, Dept Comp Sci, 77 Yongbong Ro, Kwangju 500757, South Korea
关键词
Multilevel; K-mean; Connected Component;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 50 条
  • [1] Extraction of Text Regions from Complex Background in Document Images by Multilevel Clustering
    Hoai Nam Vu
    Tuan Anh Tran
    Seop, Na In
    Kim, Soo Hyung
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2016, 4 (01) : 11 - 21
  • [2] Text Document Clustering Based on Density K-means
    Wu, Di
    Zeng, Yan
    Qu, Yin-chuan
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [3] A Linear Time Implementation of k-Means for Multilevel Thresholding of Grayscale Images
    Fonseca, Pablo
    Wainer, Jacques
    PROGRESS IN PATTERN RECOGNITION IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2014, 2014, 8827 : 120 - 126
  • [4] Histogram Thresholding for Automatic Color Segmentation Based on k-means Clustering
    Prahara, Adhi
    Yanto, Iwan Tri Riyadi
    Herawan, Tutut
    RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING, 2017, 549 : 344 - 354
  • [5] Automatic Extractive Text Summarization using K-Means Clustering
    Shetty, Krithi
    Kallimani, Jagadish S.
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 881 - 890
  • [6] Binarization by Local K-means Clustering for Korean Text Extraction
    Lai, Anh-Nga
    Lee, GueeSang
    ISSPIT: 8TH IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2008, : 117 - 122
  • [7] An ellipsoidal K-means for document clustering
    Dzogang, Fabon
    Marsala, Christophe
    Lesot, Marie-Jeanne
    Rifqi, Maria
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 221 - 230
  • [8] A New Approach to Extract Text from Images based on DWT and K-means Clustering
    Ghai, Deepika
    Gera, Divya
    Jain, Neelu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2016, 9 (05) : 900 - 916
  • [9] A New Approach to Extract Text from Images based on DWT and K-means Clustering
    Deepika Ghai
    Divya Gera
    Neelu Jain
    International Journal of Computational Intelligence Systems, 2016, 9 : 900 - 916
  • [10] A novel image text extraction method based on k-means clustering
    Song, Yan
    Liu, Anan
    Pang, Lin
    Lin, Shouxun
    Zhang, Yongdong
    Tang, Sheng
    7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 185 - 190