A Fast Appearance-Based Full-Text Search Method for Historical Newspaper Images

被引:5
|
作者
Terasawa, Kengo [1 ]
Shima, Takahiro [1 ]
Kawashima, Toshio [1 ]
机构
[1] Future Univ Hakodate, Grad Sch Syst Informat Sci, Hakodate, Hokkaido 0418655, Japan
关键词
string matching; word spotting; historical document images; Locality-Sensitive Pseudo-Code; Boyer-Moore-Horspool algorithm;
D O I
10.1109/ICDAR.2011.277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a fast appearance-based full-text search method for historical newspaper images. Since historical newspapers differ from recent newspapers in image quality, type fonts and language usages, optical character recognition (OCR) does not provide sufficient quality. Instead of OCR approach, we adopted appearance-based approach; that means we matched character to character with its shapes. Assuming proper character segmentation and proper feature description, full-text search problem is reduced to sequence matching problem or feature vector. To increase computational efficiency, we adopted pseudo-code expression called LSPC, which is a compact sketch of feature vector while retaining a good deal of its information. Experimental result showed that our method can retrieve a query string from a text of over eight million characters within a second. In addition, we predict that more sophisticated algorithm could be designed for LSPC. As an example, we established the Extended Boyer-Moore-Hors pool algorithm that can reduce the computational cost further especially when the query string becomes longer.
引用
收藏
页码:1379 / 1383
页数:5
相关论文
共 50 条
  • [21] CONCEPTS EXPLICATION OF THE HUMANITIES AND FULL-TEXT SEARCH TOOLS
    Lyapin, Sergey Kh.
    Tolstikova, Irina I.
    PSYCHOLOGY AND PSYCHIATRY, SOCIOLOGY AND HEALTHCARE, EDUCATION, VOL II, 2015, : 213 - 219
  • [22] Scalable Full-Text Search for Petascale File Systems
    Leung, Andrew W.
    Miller, Ethan L.
    PDSW'08: PROCEEDINGS OF THE 2008 3RD PETASCALE DATA STORAGE WORKSHOP, 2008, : 16 - 22
  • [23] The design and implementation of computer full-text search engine
    Bu Zhi-jing
    Fan Yan
    Yang Jian-wen
    Cheng Lin
    2015 SEVENTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2015), 2015, : 1163 - 1167
  • [24] VQFT: A Visual Query Approach Based on Full-Text Search for Knowledge Graphs
    Li, Zhaozhuo
    Yang, Yajun
    Wang, Xin
    Li, Bohan
    Wang, Meng
    Han, Dong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4397 - 4400
  • [25] A high-speed dynamic full-text search method by using memory management
    Kashiji, S
    Atlam, ES
    Fuketa, M
    Oono, M
    Morita, K
    Tsuda, K
    Aoe, JI
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2004, 81 (12) : 1477 - 1492
  • [26] Big Data Full-Text Search Index Minimization Using Text Summarization
    Iqbal, Waheed
    Malik, Waqas Ilyas
    Bukhari, Faisal
    Almustafa, Khaled Mohamad
    Nawaz, Zubiar
    INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (02): : 375 - 389
  • [27] ChemDB update - full-text search and virtual chemical space
    Chen, Jonathan H.
    Linstead, Erik
    Swamidass, S. Joshua
    Wang, Dennis
    Baldi, Pierre
    BIOINFORMATICS, 2007, 23 (17) : 2348 - 2351
  • [28] IMPROVING FULL-TEXT SEARCH PERFORMANCE THROUGH TEXTUAL ANALYSIS
    MOLTO, M
    INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (05) : 615 - 632
  • [29] Full-text federated search of text-based digital libraries in peer-to-peer networks
    Lu, Jie
    Callan, Jamie
    INFORMATION RETRIEVAL, 2006, 9 (04): : 477 - 498
  • [30] Full-text Search for Verifiable Credential Metadata on Distributed Ledgers
    Lux, Zoltan Andras
    Beierle, Felix
    Zickau, Sebastian
    Goendoer, Sebastian
    2019 SIXTH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS: SYSTEMS, MANAGEMENT AND SECURITY (IOTSMS), 2019, : 519 - 528