Window-Based Feature Extraction Framework for Machine-Printed/Handwritten and Arabic/Latin Text Discrimination

被引:0
|
作者
Mezghani, Anis [1 ]
Slimane, Fouad [2 ]
Kanoun, Slim [3 ]
Kherallah, Monji [1 ]
机构
[1] Univ Sfax, REs Grp Intelligent Machines Lab, Sfax, Tunisia
[2] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol IFN, Braunschweig, Germany
[3] Univ Sfax, ISIMS, MIRACL Lab, Sfax, Tunisia
关键词
Heterogeneous documents; writing type identification; script identification; GMM; sliding window; WRITER IDENTIFICATION; CLASSIFICATION; IMAGES; SYSTEM; SCRIPT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a new writing type and script text classification technique to recognize the identity of texts extracted from heterogeneous document images. English, French and Arabic languages are used in these documents with mixed handwritten and machine-printed types. In order to identify each text-line/word image, we propose to use 23 features computed on a fixed-length sliding window. Gaussian Mixture Models (GMMs) are used to achieve the classification objective. This framework has been tested on machine-printed and handwritten text-blocks, text-lines and words extracted from different document images of the Maurdor database. Experimental results reveal the effectiveness of our proposed system in writing type and script identification.
引用
收藏
页码:329 / 335
页数:7
相关论文
共 46 条
  • [21] An Agent-Based System for Printed/Handwritten Text Discrimination
    Cloppet, Florence
    Moraitis, Pavlos
    Vincent, Nicole
    PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS (PRIMA 2017), 2017, 10621 : 180 - 197
  • [22] Discrimination of Handwritten and Machine Printed Text In Scanned Document Images based on Rough Set Theory
    Narayan, Surabhi
    Gowda, Sahana D.
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 590 - 594
  • [23] A graph-based segmentation and feature extraction framework for Arabic text recognition
    Elgammal, AM
    Ismail, MA
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 622 - 626
  • [24] A HMM-Based Arabic/Latin Handwritten/Printed Identification System
    Rouhou, Ahmed Cheikh
    Abdelhedi, Zeineb
    Kessentini, Yousri
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 298 - 307
  • [25] A window-based feature extraction method in document copy detection
    Li, Xu
    Liu, Guo-Hua
    Ma, Flui-Dong
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON DATA, PRIVACY, AND E-COMMERCE, 2007, : 215 - +
  • [26] A New Segmentation Framework for Arabic Handwritten Text Using Machine Learning Techniques
    Saleem, Saleem Ibraheem
    Abdulazeez, Adnan Mohsin
    Orman, Zeynep
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (02): : 2727 - 2754
  • [27] Window-Based Feature Extraction Framework for Multi-Sensor Data: A Posture Recognition Case Study
    Grzegorowski, Marek
    Stawicki, Sebastian
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 397 - 405
  • [28] Fractal-Based System for Arabic/Latin, Printed/Handwritten Script Identification
    Ben Moussa, S.
    Zahour, A.
    Benabdelhafid, A.
    Alimi, A. M.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2643 - 2646
  • [29] HMM Based Keyword Spotting System in Printed/Handwritten Arabic/Latin Documents with Identification Stage
    Rouhou, Ahmed Cheikh
    Kessentini, Yousri
    Kanoun, Slim
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2019, PT I, 2019, 11662 : 309 - 320
  • [30] Window-Based Feature Extraction Method using XGBoost for Time Series Classification of Solar Flares
    McGuire, Dan
    Sauteraud, Renan
    Midya, Vishal
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5836 - 5843