Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering

被引:0
|
作者
Hoai Nam Vu [1 ]
Tuan Anh Tran [1 ]
Na, In Seop [1 ]
Kim, Soo Hyung [1 ]
机构
[1] Chonnam Natl Univ, Dept Comp Sci, 77 Yongbong Ro, Kwangju 500757, South Korea
关键词
Multilevel; K-mean; Connected Component;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 50 条
  • [21] Automatic generation of initial value k to apply k-means method for text documents clustering
    Gupta, Namita
    Saxena, P. C.
    Gupta, J. P.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2011, 3 (01) : 18 - 41
  • [22] Fuzzy k-means clustering with crisp regions
    Watanabe, N
    Imaizumi, T
    10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 199 - 202
  • [23] Efficient Sparse Spherical k-Means for Document Clustering
    Knittel, Johannes
    Koch, Steffen
    Ertl, Thomas
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [24] Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering
    Liu, Wenjun
    Sun, Yuyan
    Yu, Bao
    Wang, Hailan
    Peng, Qingcheng
    Hou, Mengshu
    Guo, Huan
    Wang, Hai
    Liu, Cheng
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [25] K-means based method for overlapping document clustering
    Beltran, Beatriz
    Vilarino, Darnes
    Martinez-Trinidad, Jose Fco.
    Carrasco-Ochoa, J. A.
    Pinto, David
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2127 - 2135
  • [26] Improved Document Clustering using K-means Algorithm
    Bide, Pramod
    Shedge, Rajashree
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [27] Document Clustering - A Feasible Demonstration with K-means Algorithm
    Arif, Wajiha
    Mahoto, Naeem Ahmed
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
  • [28] DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering
    Lakshmi, R.
    Baskar, S.
    JOURNAL OF INFORMATION SCIENCE, 2019, 45 (06) : 818 - 832
  • [29] An Extractive Text Summarization Technique for Bengali Document(s) using K-means Clustering Algorithm
    Akter, Sumya
    Asa, Aysa Siddika
    Uddin, Md. Palash
    Hossain, Md. Delowar
    Roy, Shikhor Kumer
    Ibn Afjal, Masud
    2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2017,
  • [30] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
    Adinugroho, Sigit
    Sari, Yuita Arum
    Fauzi, M. Ali
    Adikara, Putra Pandu
    2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85