Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering

被引:0
|
作者
Hoai Nam Vu [1 ]
Tuan Anh Tran [1 ]
Na, In Seop [1 ]
Kim, Soo Hyung [1 ]
机构
[1] Chonnam Natl Univ, Dept Comp Sci, 77 Yongbong Ro, Kwangju 500757, South Korea
关键词
Multilevel; K-mean; Connected Component;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 50 条
  • [31] Korean Text Extraction by Local Color Quantization and K-means Clustering in Natural Scene
    Lai, Anh-Nga
    Park, KeonHee
    Kumar, Manoj
    Lee, GueeSang
    2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 138 - 143
  • [32] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [33] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [34] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [35] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534
  • [36] Automatic segmentation of intravital fluorescence microscopy images by K-means clustering of FLIM phasors
    Zhang, Yide
    Hato, Takashi
    Dagher, Pierre C.
    Nichols, Evan L.
    Smith, Cody J.
    Dunn, Kenneth W.
    Howard, Scott S.
    OPTICS LETTERS, 2019, 44 (16) : 3928 - 3931
  • [37] TABULAR K-MEANS CLUSTERING ON REMOTE SENSING IMAGES
    Tsai, Victor J. D.
    Tsui, C. K.
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 6967 - 6970
  • [38] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [39] An Approach for Document Clustering using PSO and K-means Algorithm
    Chouhan, Rashmi
    Purohit, Anuradha
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 1380 - 1384
  • [40] Comparing document classification schemes using K-means clustering
    Silic, Artur
    Moens, Marie-Francine
    Zmak, Lovro
    Basic, Bojana Dalbelo
    KNOWLEDGE - BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2008, 5177 : 615 - +