A Fast Appearance-Based Full-Text Search Method for Historical Newspaper Images

被引:5
|
作者
Terasawa, Kengo [1 ]
Shima, Takahiro [1 ]
Kawashima, Toshio [1 ]
机构
[1] Future Univ Hakodate, Grad Sch Syst Informat Sci, Hakodate, Hokkaido 0418655, Japan
关键词
string matching; word spotting; historical document images; Locality-Sensitive Pseudo-Code; Boyer-Moore-Horspool algorithm;
D O I
10.1109/ICDAR.2011.277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a fast appearance-based full-text search method for historical newspaper images. Since historical newspapers differ from recent newspapers in image quality, type fonts and language usages, optical character recognition (OCR) does not provide sufficient quality. Instead of OCR approach, we adopted appearance-based approach; that means we matched character to character with its shapes. Assuming proper character segmentation and proper feature description, full-text search problem is reduced to sequence matching problem or feature vector. To increase computational efficiency, we adopted pseudo-code expression called LSPC, which is a compact sketch of feature vector while retaining a good deal of its information. Experimental result showed that our method can retrieve a query string from a text of over eight million characters within a second. In addition, we predict that more sophisticated algorithm could be designed for LSPC. As an example, we established the Extended Boyer-Moore-Hors pool algorithm that can reduce the computational cost further especially when the query string becomes longer.
引用
收藏
页码:1379 / 1383
页数:5
相关论文
共 50 条
  • [31] Efficient fuzzy full-text type-ahead search
    Guoliang Li
    Shengyue Ji
    Chen Li
    Jianhua Feng
    The VLDB Journal, 2011, 20 : 617 - 640
  • [32] Full-text federated search of text-based digital libraries in peer-to-peer networks
    Jie Lu
    Jamie Callan
    Information Retrieval, 2006, 9 : 477 - 498
  • [33] RepoVis: Visual Overviews and Full-Text Search in Software Repositories
    Feiner, Johannes
    Andrews, Keith
    2018 SIXTH IEEE WORKING CONFERENCE ON SOFTWARE VISUALIZATION (VISSOFT), 2018, : 1 - 11
  • [34] Efficient fuzzy full-text type-ahead search
    Li, Guoliang
    Ji, Shengyue
    Li, Chen
    Feng, Jianhua
    VLDB JOURNAL, 2011, 20 (04): : 617 - 640
  • [35] Research on full-text search technology in the agricultural scientific data center based on Lucene
    Wang, Jian
    Gao, Feng
    Zhou, Guo-Min
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON COOPERATION AND PROMOTION OF INFORMATION RESOURCES IN SCIENCE AND TECHNOLOGY(COINFO 10), 2010, : 12 - +
  • [36] A public library based on full-text retrieval
    Witten, IH
    Nevill-Manning, C
    McNab, R
    Cunningham, SJ
    COMMUNICATIONS OF THE ACM, 1998, 41 (04) : 71 - 75
  • [37] A comparative evaluation of full-text, concept-based, and context-sensitive search
    Moskovitch, Robert
    Martins, Susana B.
    Behiri, Eytan
    Weiss, Aviram
    Shahar, Yuval
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (02) : 164 - 174
  • [38] Research on Lucene Based Full-Text Query Search Service for Smart Distribution System
    Zheng Youzhuo
    Fu Yu
    Zhang Ruifeng
    Hao Shuqing
    Wen Yi
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 338 - 341
  • [39] An Efficient Index for Visual Search in Appearance-based SLAM
    Hajebi, Kiana
    Zhang, Hong
    2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 353 - 358
  • [40] Proposal of a lightweight, offline, full-text search engine for an mHealth app
    Lopes, Carla Teixeira
    Azevedo, David
    Monteiro, Joao M.
    2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2022,