Document Image Dataset Indexing and Compression Using Connected Components Clustering

被引:0
|
作者
Chatbri, Houssem [1 ]
Kameyama, Keisuke [2 ]
机构
[1] Univ Tsukuba, Dept Comp Sci, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 305, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki 305, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a method for document image dataset indexing and compression by clustering of connected components. Our method extracts connected components from each dataset image and performs component clustering to make a hash table that is a compressed indexing of the dataset. Clustering is based on component similarity which is estimated by comparing shape features extracted from the components. Then, the hash table is saved in a text file, and the text file is further compressed using any available compression methodology. Component encoding in the hash table is storage efficient and done using components' contour points and a reduced number of interior points that are sufficient for component reconstruction. We evaluate our method's performances in indexing and compression using four document image datasets. Experimental results show that indexing significantly improves efficiency when used in document image retrieval. In addition, comparative evaluation with two compression standards, namely the ZIP and XZ formats, show competitive performances. Our compression rates are below 20% and the compression errors are very low being at the order of 10 (6)% per image.
引用
收藏
页码:267 / 270
页数:4
相关论文
共 50 条
  • [31] Image compression using orthogonalized independent components bases
    Ferreira, AJ
    Figueiredo, MAT
    2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 689 - 698
  • [32] Fast Indexing of Lattice Vectors for Image Compression
    Khandelwal, R. R.
    Purohit, P. K.
    Shriwastava, S. K.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2012, 12 (05): : 85 - 89
  • [33] AUTOMATED INDEXING OF DOCUMENT IMAGE MANAGEMENT-SYSTEMS
    THIEL, TJ
    DOCUMENT & IMAGE AUTOMATION, 1992, 12 (02): : 43 - 49
  • [34] A kernelized spectral clustering method based on local affinity preserving indexing for document clustering
    1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (07):
  • [35] Image compression using K-mean clustering algorithm
    Munshi, Amani
    Alshehri, Asma
    Alharbi, Bayan
    AlGhamdi, Eman
    Banajjar, Esraa
    Albogami, Meznah
    Alshanbari, Hanan S.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (09): : 275 - 280
  • [36] The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions
    Anitei, Dan
    Andreu Sanchez, Joan
    Miguel Benedi, Jose
    Noya, Ernesto
    PATTERN RECOGNITION LETTERS, 2023, 172 : 29 - 36
  • [37] Bitmap reconstruction for document image compression
    Zhang, Q
    Danskin, JM
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 188 - 199
  • [38] Document image compression by subband system
    Kok, CW
    Nguyen, TQ
    ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 2, 1996, : 688 - 691
  • [39] Residual coding in document image compression
    Kia, OE
    Doermann, DS
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (06) : 961 - 969
  • [40] Image Indexing using Color Histogram and k-means Clustering for Optimization CBIR in Image Database
    Rejito, Juli
    Abdullah, Atje Setiawan
    Akmal
    Setiana, Deni
    Ruchjana, Budi Nurani
    ASIAN MATHEMATICAL CONFERENCE 2016 (AMC 2016), 2017, 893