Document Image Dataset Indexing and Compression Using Connected Components Clustering

被引:0
|
作者
Chatbri, Houssem [1 ]
Kameyama, Keisuke [2 ]
机构
[1] Univ Tsukuba, Dept Comp Sci, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 305, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki 305, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a method for document image dataset indexing and compression by clustering of connected components. Our method extracts connected components from each dataset image and performs component clustering to make a hash table that is a compressed indexing of the dataset. Clustering is based on component similarity which is estimated by comparing shape features extracted from the components. Then, the hash table is saved in a text file, and the text file is further compressed using any available compression methodology. Component encoding in the hash table is storage efficient and done using components' contour points and a reduced number of interior points that are sufficient for component reconstruction. We evaluate our method's performances in indexing and compression using four document image datasets. Experimental results show that indexing significantly improves efficiency when used in document image retrieval. In addition, comparative evaluation with two compression standards, namely the ZIP and XZ formats, show competitive performances. Our compression rates are below 20% and the compression errors are very low being at the order of 10 (6)% per image.
引用
收藏
页码:267 / 270
页数:4
相关论文
共 50 条
  • [41] Densely Connected AutoEncoders for Image Compression
    Song Zebang
    Sei-ichiro, Kamata
    ICIGP 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS PROCESSING / 2019 5TH INTERNATIONAL CONFERENCE ON VIRTUAL REALITY, 2019, : 78 - 83
  • [42] Bi-level document image compression using layout information
    Inglis, SJ
    Witten, IH
    DCC '96 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1996, : 442 - 442
  • [43] Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach
    Roul, Rajendra Kumar
    Asthana, Shubham Rohan
    Sahay, Sanjay Kumar
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [44] A multiresolution color clustering approach to image indexing and retrieval
    Wan, X
    Kuo, CCJ
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3705 - 3708
  • [45] A Latent Semantic Indexing-based approach to multilingual document clustering
    Wei, Chih-Ping
    Yang, Christopher C.
    Lin, Chia-Min
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 606 - 620
  • [46] Hierarchical Clustering Tree for Organizing and Indexing of Image Database
    Sun, Zhonghua
    Jia, Kebin
    Zhao, Gang
    Fu, Ping
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 2199 - 2202
  • [47] Cubic-Panorama Image Dataset Compression
    Salehi, Saeed
    Dubois, Eric
    VISUAL INFORMATION PROCESSING AND COMMUNICATION III, 2012, 8305
  • [48] Study of Fractal Color Image Compression using YUV components
    Al-Hilo, Eman A.
    George, Loay E.
    2012 IEEE 36TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2012, : 596 - 601
  • [49] Object-Based Image Indexing and Retrieval in DCT Domain using Clustering Techniques
    Nezamabadi-pour, Hossein
    Saryazdi, Saeid
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 3, 2005, 3 : 98 - 101
  • [50] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
    Adinugroho, Sigit
    Sari, Yuita Arum
    Fauzi, M. Ali
    Adikara, Putra Pandu
    2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85