Prior Segmentation of Old Arabic Manuscripts by Separator Word Spotting

被引:0
|
作者
Aouadi, Nabil [1 ]
Echi, Afef Kacem [1 ]
机构
[1] Univ Tunis, LaTICE ENSIT, 5 Ave Taha Hussein,BP 56 Bab Menara, Tunis 1008, Tunisia
关键词
component: Segmentation; Hough Generalized Transform; Word Spotting; Convex Point Theory; Skeleton; Baseline; angular variation; SYSTEM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Because of the low quality of old manuscripts, the complexity of Arabic script and the different writing styles, segmenting them is a challenging problem. This work aims to preprocess these manuscripts to be correctly segmented into independent words for text recognition. The idea is to spot separator words, detach them from neighboring words if necessary and use them to segment text-lines into words. To locate separator word in these document images, we proposed a word spotting method based on Generalized Hough Transform. This method is performed using convex theory points. Around a window centered on the group of votes of the separator word, it detects all connections below text-line baseline, analyses terminal letter morphology and tries to separate between touching or overlapping components. We tested the proposed system on Arabic historical manuscripts from the 19th century onwards conserved in the Tunisian National Archives. Experiments show very encouraging results.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 50 条
  • [21] Evaluation of word spotting under improper segmentation scenario
    Dey, Sounak
    Nicolaou, Anguelos
    Llados, Josep
    Pal, Umapada
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2019, 22 (04) : 361 - 374
  • [22] Segmentation-free word spotting with exemplar SVMs
    Almazan, Jon
    Gordo, Albert
    Fornes, Alicia
    Valveny, Ernest
    [J]. PATTERN RECOGNITION, 2014, 47 (12) : 3967 - 3978
  • [23] Evaluation of word spotting under improper segmentation scenario
    Sounak Dey
    Anguelos Nicolaou
    Josep Lladós
    Umapada Pal
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2019, 22 : 361 - 374
  • [24] Template-free word spotting in low-quality manuscripts
    Cao, Huaigu
    Govindaraju, Venu
    [J]. PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 2007, : 135 - +
  • [25] A Complete Scheme of Word Spotting System for the Balinese Palm Leaf Manuscripts
    Kesiman, Made Windu Antara
    Pradnyana, Gede Aditra
    [J]. 2019 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE 2019), 2019,
  • [26] Word Hypotheses for Segmentation-free Word Spotting in Historic Document Images
    Rothacker, Leonard
    Sudholt, Sebastian
    Rusakov, Eugen
    Kasperidus, Matthias
    Fink, Gernot A.
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1174 - 1179
  • [27] Learning-based word spotting system for Arabic handwritten documents
    Khayyat, Muna
    Lam, Louisa
    Suen, Ching Y.
    [J]. PATTERN RECOGNITION, 2014, 47 (03) : 1021 - 1030
  • [28] Ridgelet-DTW-Based Word Spotting for Arabic Historical Document
    Brik, Youcef
    Chibani, Youcef
    Zemouri, Et-Tahir
    Sehad, Abdenour
    [J]. 2013 8TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA), 2013, : 194 - +
  • [29] Effect of Word Segmentation on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Al-Subaie, Abdullah
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 127 - 131
  • [30] Arabic Word Segmentation for Better Unit of Analysis
    Benajiba, Yassine
    Zitouni, Imed
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1346 - 1352