Document Image Dataset Indexing and Compression Using Connected Components Clustering

被引:0
|
作者
Chatbri, Houssem [1 ]
Kameyama, Keisuke [2 ]
机构
[1] Univ Tsukuba, Dept Comp Sci, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 305, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki 305, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a method for document image dataset indexing and compression by clustering of connected components. Our method extracts connected components from each dataset image and performs component clustering to make a hash table that is a compressed indexing of the dataset. Clustering is based on component similarity which is estimated by comparing shape features extracted from the components. Then, the hash table is saved in a text file, and the text file is further compressed using any available compression methodology. Component encoding in the hash table is storage efficient and done using components' contour points and a reduced number of interior points that are sufficient for component reconstruction. We evaluate our method's performances in indexing and compression using four document image datasets. Experimental results show that indexing significantly improves efficiency when used in document image retrieval. In addition, comparative evaluation with two compression standards, namely the ZIP and XZ formats, show competitive performances. Our compression rates are below 20% and the compression errors are very low being at the order of 10 (6)% per image.
引用
收藏
页码:267 / 270
页数:4
相关论文
共 50 条
  • [1] A document image segmentation system using analysis of connected components
    Zirari, F.
    Ennaji, A.
    Nicolas, S.
    Mammass, D.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 753 - 757
  • [2] Document clustering using locality preserving indexing
    Cai, D
    He, XF
    Han, JW
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1624 - 1637
  • [3] Lidar depth image compression using clustering, re-indexing, and JPEG2000
    Karpman, Dmitriy
    Ashbrook, David
    Li, Xiaoling
    Duan, Ye
    Zeng, Wenjun
    LASER RADAR TECHNOLOGY AND APPLICATIONS XVI, 2011, 8037
  • [4] Using Latent Semantic Indexing to Improve the Accuracy of Document Clustering
    Zhan, Jiaming
    Loh, Han Tong
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2007, 6 (03) : 181 - 188
  • [5] Content-Based Image Indexing by Data Clustering and Inverse Document Frequency
    Grycuk, Rafal
    Gabryel, Marcin
    Korytkowski, Marcin
    Scherer, Rafal
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2014, 2014, 424 : 374 - 383
  • [6] Image compression using PCA with clustering
    Wang, Chih-Wen
    Jeng, Jyh-Horng
    IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2012), 2012,
  • [7] EXPERIMENTS WITH DOCUMENT COMPONENTS FOR INDEXING AND RETRIEVAL
    KWOK, KL
    KUAN, W
    INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (04) : 405 - 417
  • [8] THE CONCEPT OF DOCUMENT COMPONENTS FOR PROBABILISTIC INDEXING
    KWOK, KL
    PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1986, 23 : 158 - 162
  • [9] An image compression and indexing system using neural networks
    Jiang, J
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1997, 8 (02) : 135 - 145
  • [10] Document clustering using locality preserving indexing and support vector machines
    Chengfu Yang
    Zhang Yi
    Soft Computing, 2008, 12 : 677 - 683