An efficient, font independent word and character segmentation algorithm for printed Arabic text

被引:11
|
作者
Qaroush, Aziz [1 ]
Jaber, Bassam [1 ]
Mohammad, Khader [1 ]
Washaha, Mahdi [1 ]
Maali, Eman [1 ]
Nayef, Nibal [2 ]
机构
[1] Birzeit Univ, Dept Elect & Comp Engn, Birzeit, Palestine
[2] Univ La Rochelle, L3i, La Rochelle, France
关键词
Arabic OCR; Word segmentation; Character segmentation; Cursive script; Segmentation techniques; Baseline; Projection profile; RECOGNITION; LINE;
D O I
10.1016/j.jksuci.2019.08.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Characters segmentation is a necessity and the most critical stage in Arabic OCR system. It has attracted the interest of a wide range of researchers. However, the nature of the Arabic cursive script poses extra challenges that need further investigation. Therefore, having a reliable and efficient Arabic OCR system that is independent of font variations is highly required. In this paper, an indirect, font-in dependent word and character segmentation algorithm for printed Arabic text investigated. The proposed algorithm takes a binary line image as an input and produces a set of binary images consisting of one character or ligature as an output. The segmentation performed at two levels: a word segmentation performed in the first level, by employing a vertical projection at the input line image along with using Interquartile Range (IQR) method to differentiate between word gaps and within word gaps. A projection profile method used as a second level of segmentation along with a set of statistical and topological features, which are font-independent, to identify the correct segmentation points from all potential points. The APTI dataset used to test the proposed algorithm with a variety of font type, size, and style. The algorithm experimented on 1800 lines (approximately 24,816 words) with an average accuracy of 97.7% for words segmentation and 97.51% for characters segmentation. (c) 2019 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:1330 / 1344
页数:15
相关论文
共 50 条
  • [1] A Font Invariant Character Segmentation Technique for Printed Bangla Word Images
    Sarkar, Ram
    Malakar, Samir
    Das, Nibaran
    Basu, Subhadip
    Kundu, Mahantapas
    Nasipuri, Mita
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 739 - +
  • [2] A new algorithm for machine printed Arabic character segmentation
    Zheng, LY
    Hassin, AH
    Tang, XL
    [J]. PATTERN RECOGNITION LETTERS, 2004, 25 (15) : 1723 - 1729
  • [3] Segmentation-based, omnifont printed Arabic character recognition without font identification
    Qaroush, Aziz
    Awad, Abdalkarim
    Modallal, Mohammad
    Ziq, Malik
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 3025 - 3039
  • [4] Line, word and Character Segmentation of Manipuri Machine Printed Text
    Nath, Keshab
    Jelil, Sarfaraz
    Rahul, Laishram
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 203 - 206
  • [5] Contour-based character segmentation for printed Arabic text with diacritics
    Mohammad, Khader
    Qaroush, Aziz
    Ayesh, Muna
    Washha, Mahdi
    Alsadeh, Ahmad
    Agaian, Sos
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (04)
  • [6] Printed Arabic Optical Character Segmentation
    Mohammad, Khader
    Ayyesh, Muna
    Qaroush, Aziz
    Tumar, Iyad
    [J]. IMAGE PROCESSING: ALGORITHMS AND SYSTEMS XIII, 2015, 9399
  • [7] Optical Character Recognition of Arabic Printed Text
    Taha, Safwa
    Babiker, Yusra
    Abbas, Mohamed
    [J]. 2012 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2012,
  • [8] Effect of Word Segmentation on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Al-Subaie, Abdullah
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 127 - 131
  • [9] Efficient Recognition of Machine Printed Arabic Text Using Partial Segmentation and Hausdorff Distance
    Saabni, Raid
    [J]. 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 284 - 289
  • [10] Lines segmentation and word extraction of Arabic handwritten text
    Lamsaf, Asmae
    Aitkerroum, Mounir
    Boulaknadel, Siham
    Fakhri, Youssef
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA'18), 2018,