Window-Based Feature Extraction Framework for Machine-Printed/Handwritten and Arabic/Latin Text Discrimination

被引:0
|
作者
Mezghani, Anis [1 ]
Slimane, Fouad [2 ]
Kanoun, Slim [3 ]
Kherallah, Monji [1 ]
机构
[1] Univ Sfax, REs Grp Intelligent Machines Lab, Sfax, Tunisia
[2] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol IFN, Braunschweig, Germany
[3] Univ Sfax, ISIMS, MIRACL Lab, Sfax, Tunisia
关键词
Heterogeneous documents; writing type identification; script identification; GMM; sliding window; WRITER IDENTIFICATION; CLASSIFICATION; IMAGES; SYSTEM; SCRIPT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a new writing type and script text classification technique to recognize the identity of texts extracted from heterogeneous document images. English, French and Arabic languages are used in these documents with mixed handwritten and machine-printed types. In order to identify each text-line/word image, we propose to use 23 features computed on a fixed-length sliding window. Gaussian Mixture Models (GMMs) are used to achieve the classification objective. This framework has been tested on machine-printed and handwritten text-blocks, text-lines and words extracted from different document images of the Maurdor database. Experimental results reveal the effectiveness of our proposed system in writing type and script identification.
引用
收藏
页码:329 / 335
页数:7
相关论文
共 46 条
  • [11] Distinction between handwritten and machine-printed text based on the bag of visual words model
    Zagoris, Konstantinos
    Pratikakis, Ioannis
    Antonacopoulos, Apostolos
    Gatos, Basilis
    Papamarkos, Nikos
    PATTERN RECOGNITION, 2014, 47 (03) : 1051 - 1062
  • [12] TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text
    Medhat, Fady
    Mohammadi, Mahnaz
    Jaf, Sardar
    Willcocks, Chris G.
    Breckon, Toby P.
    Matthews, Peter
    McGough, Andrew Stephen
    Theodoropoulos, Georgios
    Obara, Boguslaw
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2986 - 2994
  • [13] Robust shared feature learning for script and handwritten/machine-printed identification
    Feng, Ziyong
    Yang, Zhaoyang
    Jin, Lianwen
    Huang, Shuangping
    Sun, Jun
    PATTERN RECOGNITION LETTERS, 2017, 100 : 6 - 13
  • [14] Shape Codebook based Handwritten and Machine Printed Text Zone Extraction
    Kumar, Jayant
    Prasad, Rohit
    Cao, Huaigu
    Abd-Almageed, Wael
    Doermann, David
    Natarajan, Premkumar
    DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
  • [15] A Review of Feature Extraction Techniques for Handwritten Arabic Text Recognition
    El qacimy, Bouchra
    Hammouch, Ahmed
    Ait Kerroum, Mounir
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT 2015), 2015, : 241 - 245
  • [16] A window-based time series feature extraction method
    Katircioglu-Ozturk, Deniz
    Guvenir, H. Altay
    Ravens, Ursula
    Baykal, Nazife
    COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 89 : 466 - 486
  • [17] Distinction between Handwritten and Machine-Printed Characters with No Need to Locate Character or Text Line Position
    Koyama, Jumpei
    Kato, Masahiro
    Hirose, Akira
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 4044 - 4051
  • [18] LOCAL-SPECTRUM-BASED DISTINCTION BETWEEN HANDWRITTEN AND MACHINE-PRINTED CHARACTERS
    Koyama, J.
    Hirose, A.
    Kato, M.
    2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 1021 - 1024
  • [19] Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models
    Ahmad, Irfan
    Mahmoud, Sabri A.
    Fink, Gernot A.
    PATTERN RECOGNITION, 2016, 51 : 97 - 111
  • [20] Exploring topological data analysis for information extraction: application to recognition of Arabic machine-printed numerals
    Bouchaffra D.
    Ykhlef F.
    Journal of Engineering and Applied Science, 2024, 71 (01):