Enabling Indexing and Retrieval of Historical Arabic Manuscripts through Template Matching Based Word Spotting

被引:0
|
作者
Faisal, Tayyeba [1 ]
AlMaadeed, Somaya [1 ]
机构
[1] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
关键词
word spotting; template matching; correlation similarity; historical; Arabic; HANDWRITTEN DOCUMENTS; SYSTEM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a holistic segmentation-free query by example word spotting technique based on template matching. We have applied this technique to a dataset of historical Arabic handwritten manuscript images. First, the documents as well as query word images are pre-processed for separating text from the noisy background and converting to their binary equivalents. Then a pixel based approach is used for computing the similarity between the pre-processed template query word and document images by using the Correlation similarity measure. Slight variations in font sizes are tolerated by adjusting the threshold of similarity. Our robust pre-processing algorithm significantly enhances the performance of the learning-free template matching based word spotting approach. The proposed technique is simple as well as efficient as it does not involve any time-consuming learning steps. Experiments with a historical Arabic dataset yield promising results. This technique can generate locations of occurrences of query word images which is the fundamental step towards building searchable indexes for historical manuscripts.
引用
下载
收藏
页码:57 / 63
页数:7
相关论文
共 50 条
  • [41] Semantically Enhanced Term Frequency based on Word Embeddings for Arabic Information Retrieval
    El Mahdaouy, Abdelkader
    El Alaoui, Said Ouatik
    Gaussier, Eric
    2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 385 - 389
  • [42] Word-based correction tor retrieval of arabic OCR degraded documents
    Magdy, Walid
    Darwish, Kareem
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2006, 4209 : 205 - 216
  • [43] PARTITION-BASED PATTERN MATCHING APPROACH FOR EFFICIENT RETRIEVAL OF ARABIC TEXT
    Hakak, Saqib
    Kamsin, Amirrudin
    Shivakumara, Palaiahnakote
    Idris, Mohd Yamani Idna
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2018, 31 (03) : 200 - 209
  • [44] Concept-matching IR systems versus word-matching information retrieval systems:: Considering fuzzy interrelations for indexing Web pages
    Garcés, PJ
    Olivas, JA
    Romero, FP
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (04): : 564 - 576
  • [45] Content-based video indexing and retrieval using Radon transform and pattern matching
    Celenk, M
    Zhou, Q
    Wang, P
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 460 - 471
  • [46] A KD-TREE BASED DYNAMIC INDEXING SCHEME FOR VIDEO RETRIEVAL AND GEOMETRY MATCHING
    Gao, Li
    Li, Zhu
    Katsaggelos, Aggelos K.
    2008 PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, VOLS 1 AND 2, 2008, : 940 - +
  • [47] A study on content based image retrieval using template matching of wavelet transform
    Seo, Duck Won
    You, Kang Soo
    Kwak, Hoon Sung
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2006, 4251 : 542 - 549
  • [48] Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents
    Wei, Hongxi
    Zhang, Hui
    Gao, Guanglai
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3616 - 3621
  • [49] Exploring Semantic Similarity Measure Based on Word Embedding Representation for Arabic Passages Retrieval
    Lahbari, Imane
    El Alaoui, Said Ouatik
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 978 - 989
  • [50] Word-embedding-based pseudo-relevance feedback for Arabic information retrieval
    El Mahdaouy, Abdelkader
    El Alaoui, Said Ouatik
    Gaussier, Eric
    JOURNAL OF INFORMATION SCIENCE, 2019, 45 (04) : 429 - 442