Document image retrieval through word shape coding

被引:48
|
作者
Lu, Shijian [1 ]
Li, Linlin [2 ]
Tan, Chew Lim [2 ]
机构
[1] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 119613, Singapore
[2] Natl Univ Singapore, Sch Comp, Dept Comp Sci, Singapore 117543, Singapore
关键词
document image retrieval; document image analysis; word shape coding;
D O I
10.1109/TPAMI.2008.89
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
引用
收藏
页码:1913 / 1918
页数:6
相关论文
共 50 条
  • [1] FARSI/ARABIC DOCUMENT IMAGE RETRIEVAL THROUGH SUB - LETTER SHAPE CODING
    Bahmani, Zahra
    Azmi, Reza
    2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 661 - 665
  • [2] Word shape recognition for image-based document retrieval
    Huang, WH
    Tan, CL
    Sung, SY
    Xu, Y
    2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 1114 - 1117
  • [3] Scanned english document retrieval based on OCR and word shape coding
    Xia, Yong
    Dai, Ru-Wei
    Xiao, Bai-Hua
    Wang, Chun-Heng
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (03): : 488 - 493
  • [4] Document Specific Sparse Coding for Word Retrieval
    Shekhar, Ravi
    Jawahar, C. V.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 643 - 647
  • [5] Document Image Coding for Processing and Retrieval
    Omid E. Kia
    David S. Doermann
    Journal of VLSI signal processing systems for signal, image and video technology, 1998, 20 : 121 - 135
  • [6] Document image coding for processing and retrieval
    Kia, OE
    Doermann, DS
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 1998, 20 (1-2): : 121 - 135
  • [7] Document image coding for processing and retrieval
    Natl Inst of Standards and, Technology, Gaithersburg, United States
    J VLSI Signal Process Syst Signal Image Video Technol, 1-2 (121-135):
  • [8] Retrieval of machine-printed Latin documents through Word Shape Coding
    Lu, Shijian
    Tan, Chew Lim
    PATTERN RECOGNITION, 2008, 41 (05) : 1799 - 1809
  • [9] Large scale document image retrieval by automatic word annotation
    Sankar, K. Pramod
    Manmatha, R.
    Jawahar, C. V.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2014, 17 (01) : 1 - 17
  • [10] Large scale document image retrieval by automatic word annotation
    K. Pramod Sankar
    R. Manmatha
    C. V. Jawahar
    International Journal on Document Analysis and Recognition (IJDAR), 2014, 17 : 1 - 17