An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images

被引:8
|
作者
Grailu, Hadi [1 ]
Lotfizad, Mojtaba [1 ]
Sadoghi-Yazdi, Hadi [2 ]
机构
[1] Tarbiat Modares Univ, Engn Dept, Tehran, Iran
[2] Tarbiat Moallem Univ Sabzevar, Engn Dept, Sabzevar, Iran
关键词
Libraries; Languages;
D O I
10.1108/17563780910939273
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - The purpose of this paper is to propose a lossy/lossless binary textual image compression method based on an improved pattern matching (PM) technique. Design/methodology/approach - In the Farsi/Arabic script, contrary to the printed Latin script, letters usually attach together and produce various patterns. Hence, some patterns are fully or partially subsets of some others. Two new ideas are proposed here. First, the number of library prototypes is reduced by detecting and then removing the fully or partially similar prototypes. Second, a new effective pattern encoding scheme is proposed for all types of patterns including text and graphics. The new encoding scheme has two operation modes of chain coding and soft PM, depending on the ratio of the pattern area to its chain code effective length. In order to encode the number sequences, the authors have modified the multi-symbol QM-coder. The proposed method has three levels for the lossy compression. Each level, in its turn, further increases the compression ratio. The first level includes applying some processing in the chain code domain such as omission of small patterns and holes, omission of inner holes of characters, and smoothing the boundaries of the patterns. The second level includes the selective pixel reversal technique, and the third level includes using the proposed method of prioritizing the residual patterns for encoding, with respect to their degree of compactness. Findings - Experimental results show that the compression performance of the proposed method is considerably better than that of the best existing binary textual image compression methods as high as 1.6-3 times in the lossy case and 1.3-2.4 times in the lossless case at 300 dpi. The maximum compression ratios are achieved for Farsi and Arabic textual images. Research limitations/implications - Only the binary printed typeset textual images are considered. Practical implications - The proposed method has a high-compression ratio for archiving and storage applications. Originality/value - To the authors' best knowledge, the existing textual image compression methods or standards have not so far exploited the property of full or partial similarity of prototypes for increasing the compression ratio for any scripts. Also, the idea of combining the boundary description methods with the run-length and arithmetic coding techniques has not so far been used.
引用
收藏
页码:120 / 147
页数:28
相关论文
共 9 条
  • [1] 1-D chaincode pattern matching for compression of Bi-level printed farsi and arabic textual images
    Grailu, Hadi
    Lotfizad, Mojtaba
    Sadoghi-Yazdi, Hadi
    IMAGE AND VISION COMPUTING, 2009, 27 (10) : 1615 - 1625
  • [2] Lossless and lossy compression of text images by soft pattern matching
    Howard, PG
    DCC '96 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1996, : 210 - 219
  • [3] A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching
    Hadi Grailu
    Mojtaba Lotfizad
    Hadi Sadoghi-Yazdi
    International Journal of Document Analysis and Recognition (IJDAR), 2009, 11 : 159 - 182
  • [4] A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching
    Grailu, Hadi
    Lotfizad, Mojtaba
    Sadoghi-Yazdi, Hadi
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2009, 11 (04) : 159 - 182
  • [5] TEXTUAL IMAGE COMPRESSION - 2-STAGE LOSSY LOSSLESS ENCODING OF TEXTUAL IMAGES
    WITTEN, IH
    BELL, TC
    EMBERSON, H
    INGLIS, S
    MOFFAT, A
    PROCEEDINGS OF THE IEEE, 1994, 82 (06) : 878 - 888
  • [6] Farsi and Arabic document images lossy compression based on the mixed raster content model
    Hadi Grailu
    Mojtaba Lotfizad
    Hadi Sadoghi-Yazdi
    International Journal on Document Analysis and Recognition (IJDAR), 2009, 12 : 227 - 248
  • [7] Farsi and Arabic document images lossy compression based on the mixed raster content model
    Grailu, Hadi
    Lotfizad, Mojtaba
    Sadoghi-Yazdi, Hadi
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2009, 12 (04) : 227 - 248
  • [8] Lossy-to-lossless compression of images based on binary tree decomposition
    Pinho, Armando J.
    Neves, Antonio J. R.
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 2257 - +
  • [9] LZ1 compression of binary images using a simple rectangle greedy matching technique
    Cinque, L
    Grande, E
    De Agostino, S
    DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, : 492 - 492