A Fast Appearance-Based Full-Text Search Method for Historical Newspaper Images

被引:5
|
作者
Terasawa, Kengo [1 ]
Shima, Takahiro [1 ]
Kawashima, Toshio [1 ]
机构
[1] Future Univ Hakodate, Grad Sch Syst Informat Sci, Hakodate, Hokkaido 0418655, Japan
关键词
string matching; word spotting; historical document images; Locality-Sensitive Pseudo-Code; Boyer-Moore-Horspool algorithm;
D O I
10.1109/ICDAR.2011.277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a fast appearance-based full-text search method for historical newspaper images. Since historical newspapers differ from recent newspapers in image quality, type fonts and language usages, optical character recognition (OCR) does not provide sufficient quality. Instead of OCR approach, we adopted appearance-based approach; that means we matched character to character with its shapes. Assuming proper character segmentation and proper feature description, full-text search problem is reduced to sequence matching problem or feature vector. To increase computational efficiency, we adopted pseudo-code expression called LSPC, which is a compact sketch of feature vector while retaining a good deal of its information. Experimental result showed that our method can retrieve a query string from a text of over eight million characters within a second. In addition, we predict that more sophisticated algorithm could be designed for LSPC. As an example, we established the Extended Boyer-Moore-Hors pool algorithm that can reduce the computational cost further especially when the query string becomes longer.
引用
收藏
页码:1379 / 1383
页数:5
相关论文
共 50 条
  • [41] APPEARANCE-BASED OBJECT DETECTION IN COLOUR RETINAL IMAGES
    Singh, Jeetinder
    Joshi, Gopal Datt
    Sivaswamy, Jayanthi
    2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 1432 - 1435
  • [42] Enhancing HDFS with a full-text search system for massive small files
    Xu, Wentao
    Zhao, Xin
    Lao, Bin
    Nong, Ge
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (07): : 7149 - 7170
  • [43] Enhancing HDFS with a full-text search system for massive small files
    Wentao Xu
    Xin Zhao
    Bin Lao
    Ge Nong
    The Journal of Supercomputing, 2021, 77 : 7149 - 7170
  • [44] Improving Bilingual Search Performance Using Compact Full-Text Indices
    Costa, Jorge
    Gomes, Luis
    Lopes, Gabriel P.
    Russo, Luis M. S.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 582 - 595
  • [45] Full-text search engine with suffix index for massive heterogeneous data
    Xu, Wentao
    Chen, Haoyu
    Huan, Yidong
    Hu, Xuedong
    Nong, Ge
    INFORMATION SYSTEMS, 2022, 104
  • [46] TRMeister: a DBMS with high-performance full-text search functions
    Ikeda, T
    Mano, H
    Itoh, H
    Takegawa, H
    Hiraoka, T
    Horibe, S
    Ogawa, Y
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 958 - 967
  • [47] Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search
    Uematsu, Yukio
    Inoue, Takafumi
    Fujioka, Kengo
    Kataoka, Ryoji
    Ohwada, Hayato
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2008, 5173 : 308 - +
  • [48] Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
    Zanibbi, Richard
    Orakwue, Awelemdy
    INTELLIGENT COMPUTER MATHEMATICS, CICM 2015, 2015, 9150 : 18 - 36
  • [50] Integrating expert system with a full-text search to solve growers' problems
    Elsayed, Abdelrahman
    Hazman, Maryam
    Ellakwa, Susan F.
    2019 15TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO 2019), 2019, : 192 - 197