Prior Segmentation of Old Arabic Manuscripts by Separator Word Spotting

被引:0
|
作者
Aouadi, Nabil [1 ]
Echi, Afef Kacem [1 ]
机构
[1] Univ Tunis, LaTICE ENSIT, 5 Ave Taha Hussein,BP 56 Bab Menara, Tunis 1008, Tunisia
关键词
component: Segmentation; Hough Generalized Transform; Word Spotting; Convex Point Theory; Skeleton; Baseline; angular variation; SYSTEM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Because of the low quality of old manuscripts, the complexity of Arabic script and the different writing styles, segmenting them is a challenging problem. This work aims to preprocess these manuscripts to be correctly segmented into independent words for text recognition. The idea is to spot separator words, detach them from neighboring words if necessary and use them to segment text-lines into words. To locate separator word in these document images, we proposed a word spotting method based on Generalized Hough Transform. This method is performed using convex theory points. Around a window centered on the group of votes of the separator word, it detects all connections below text-line baseline, analyses terminal letter morphology and tries to separate between touching or overlapping components. We tested the proposed system on Arabic historical manuscripts from the 19th century onwards conserved in the Tunisian National Archives. Experiments show very encouraging results.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 50 条
  • [1] Segmentation-free Word Spotting for Handwritten Arabic Documents
    Khaissidi, G.
    Elfakir, Y.
    Mrabti, M.
    Lakhliai, Z.
    Chenouni, D.
    El Yacoubi, M.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2016, 4 (01): : 6 - 10
  • [2] Onmilingual segmentation-free word spotting for ancient manuscripts indexation
    Leydier, Y
    Le Bourgeois, F
    Emptoz, H
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 533 - 537
  • [3] On Influence of Line Segmentation in Efficient Word Segmentation in Old Manuscripts
    Fernandez, D.
    Llados, J.
    Fornes, A.
    Manmatha, R.
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 763 - 768
  • [4] Segmentation and Word Spotting Methods for Printed and Handwritten Arabic Texts: A Comparative Study
    Kchaou, Mariem Gargouri
    Kanoun, Slim
    Ogier, Jean-Marc
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 274 - 279
  • [5] Features for word spotting in historical manuscripts
    Rath, TM
    Manmatha, R
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 218 - 222
  • [6] Enabling Indexing and Retrieval of Historical Arabic Manuscripts through Template Matching Based Word Spotting
    Faisal, Tayyeba
    AlMaadeed, Somaya
    [J]. 2017 1ST INTERNATIONAL WORKSHOP ON ARABIC SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2017, : 57 - 63
  • [7] Segmentation of Touching Component in Arabic Manuscripts
    Aouadi, N.
    Kacem, A.
    Belaid, A.
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 452 - 457
  • [8] An Automatic Word-spotting Framework for Medieval Manuscripts
    Pintus, Ruggero
    Yang, Ying
    Gobbetti, Enrico
    Rushmeier, Holly
    [J]. 2015 DIGITAL HERITAGE INTERNATIONAL CONGRESS, VOL 2: ANALYSIS & INTERPRETATION THEORY, METHODOLOGIES, PRESERVATION & STANDARDS DIGITAL HERITAGE PROJECTS & APPLICATIONS, 2015, : 5 - 12
  • [9] Keyword Spotting in Historical Devanagari Manuscripts by Word Matching
    Sharada, B.
    Sushma, S. N.
    Bharathlal
    [J]. DATA ANALYTICS AND LEARNING, 2019, 43 : 65 - 76
  • [10] An Historical Handwritten Arabic Dataset for Segmentation-Free Word Spotting-HADARA80P
    Pantke, Werner
    Dennhardt, Martin
    Fecker, Daniel
    Maergner, Volker
    Fingscheidt, Tim
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 15 - 20