A Zone Classification Approach for Arabic Documents using Hybrid Features

被引:0
|
作者
Hesham, Amany M. [1 ]
Abdou, Sherif [1 ]
Badr, Amr [1 ]
Rashwan, Mohsen [2 ]
Al-Barhamtoshy, Hassanin M. [3 ]
机构
[1] Cairo Univ, Fac Comp & Informat, Cairo, Egypt
[2] Cairo Univ, Fac Engn, Cairo, Egypt
[3] King Abdulaziz Univ, Comp & Informat Technol, Jeddah, Saudi Arabia
关键词
segmentation; layout analysis; texture features; connected component analysis; Arabic script; genetic algorithms;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper proposes a framework for zone segmentation and classification. Zones are segmented using morphological operation and connected component analysis. Features are then extracted from each zone for the purpose of classification into text and non-text. Features are hybrid between texture-based and connected component based features. Effective features are selected using genetic algorithm. Selected features are fed into a linear SVM classifier for zone classification. System evaluation shows that the proposed zone classification works well on multi-font and multi-size documents with a variety of layouts even on historical documents.
引用
收藏
页码:158 / 162
页数:5
相关论文
共 50 条
  • [1] Arabic Sentiment Classification: A Hybrid Approach
    Biltawi, Mariam
    Al-Naymat, Ghazi
    Tedmori, Sara
    [J]. 2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2017, : 104 - 108
  • [2] Writer identification of Arabic handwriting documents using grapheme features
    AL-Ma'adeed, Somaya
    Al-Kurbi, Amat-AlAleem
    Al-Muslih, Amal
    Al-Qahtani, Reem
    Al Kubisi, Haend
    [J]. 2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, : 923 - 924
  • [3] Classification of personal Arabic handwritten documents
    Brook, Salama
    Al Aghbari, Zaher
    [J]. WSEAS Transactions on Information Science and Applications, 2008, 5 (06): : 1021 - 1030
  • [4] CLASSIFICATION OF ARID ZONE SOILS .1. APPROACH TO CLASSIFICATION OF ARID ZONE SOILS USING DEPOSITIONAL FEATURES
    WESTERN, S
    [J]. JOURNAL OF SOIL SCIENCE, 1972, 23 (03): : 266 - &
  • [5] A hybrid classifier approach for web retrieved documents classification
    Bot, RS
    Wu, YFB
    Chen, X
    Li, QZ
    [J]. ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, PROCEEDINGS, 2004, : 326 - 330
  • [6] Machine learning approach for the classification of corn seed using hybrid features
    Ali, Aqib
    Qadri, Salman
    Mashwani, Wali Khan
    Belhaouari, Samir Brahim
    Naeem, Samreen
    Rafique, Sidra
    Jamal, Farrukh
    Chesneau, Christophe
    Anam, Sania
    [J]. INTERNATIONAL JOURNAL OF FOOD PROPERTIES, 2020, 23 (01) : 1110 - 1124
  • [7] Hybrid Approach to Features Extraction for Online Arabic Character Recognition
    Nakkach, Houda
    Hichri, Soumaya
    Haboubi, Sofiene
    Amiri, Hamid
    [J]. 2016 13TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS, IMAGING AND VISUALIZATION (CGIV), 2016, : 253 - 258
  • [8] Arabic OCR Using a Novel Hybrid Classification Scheme
    Hafiz, Abdul Mueed
    Bhat, Ghulam Mohiuddin
    [J]. JOURNAL OF PATTERN RECOGNITION RESEARCH, 2016, 11 (01): : 55 - 60
  • [9] Study for Automatic Classification of Arabic Spoken Documents
    Labidi, Mohamed
    Maraoui, Mohsen
    Zrigui, Mounir
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2017, PT II, 2017, 10449 : 459 - 468
  • [10] Ensemble Machine Learning Approach for Android Malware Classification Using Hybrid Features
    Pektas, Abdurrahman
    Acarman, Tankut
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2017, 2018, 578 : 191 - 200