Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering

被引：0

作者：

Hoai Nam Vu ^{[1
]}

Tuan Anh Tran ^{[1
]}

Na, In Seop ^{[1
]}

Kim, Soo Hyung ^{[1
]}

机构：

[1] Chonnam Natl Univ, Dept Comp Sci, 77 Yongbong Ro, Kwangju 500757, South Korea

来源：

2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS) | 2015年

关键词：

Multilevel; K-mean; Connected Component;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.

引用

页码：329 / 334

页数：6

共 50 条

[21] Automatic generation of initial value k to apply k-means method for text documents clustering
Gupta, Namita
Saxena, P. C.
Gupta, J. P.
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2011, 3 (01) : 18 - 41
[22] Fuzzy k-means clustering with crisp regions
Watanabe, N
Imaizumi, T
10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 199 - 202
[23] Efficient Sparse Spherical k-Means for Document Clustering
Knittel, Johannes
Koch, Steffen
Ertl, Thomas
PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
[24] Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering
Liu, Wenjun
Sun, Yuyan
Yu, Bao
Wang, Hailan
Peng, Qingcheng
Hou, Mengshu
Guo, Huan
Wang, Hai
Liu, Cheng
KNOWLEDGE-BASED SYSTEMS, 2024, 287
[25] K-means based method for overlapping document clustering
Beltran, Beatriz
Vilarino, Darnes
Martinez-Trinidad, Jose Fco.
Carrasco-Ochoa, J. A.
Pinto, David
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2127 - 2135
[26] Improved Document Clustering using K-means Algorithm
Bide, Pramod
Shedge, Rajashree
2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
[27] Document Clustering - A Feasible Demonstration with K-means Algorithm
Arif, Wajiha
Mahoto, Naeem Ahmed
2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
[28] DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering
Lakshmi, R.
Baskar, S.
JOURNAL OF INFORMATION SCIENCE, 2019, 45 (06) : 818 - 832
[29] An Extractive Text Summarization Technique for Bengali Document(s) using K-means Clustering Algorithm
Akter, Sumya
Asa, Aysa Siddika
Uddin, Md. Palash
Hossain, Md. Delowar
Roy, Shikhor Kumer
Ibn Afjal, Masud
2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2017,
[30] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
Adinugroho, Sigit
Sari, Yuita Arum
Fauzi, M. Ali
Adikara, Putra Pandu
2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85

← 1 2 3 4 5 →