Urdu text in natural scene images: a new dataset and preliminary text detection

被引:6
|
作者
Ali, Hazrat [1 ]
Iqbal, Khalid [2 ]
Mujtaba, Ghulam [3 ]
Fayyaz, Ahmad [3 ]
Bulbul, Mohammad Farhad [4 ]
Karam, Fazal Wahab [3 ]
Zahir, Ali [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Elect & Comp Engn, Abbottabad Campus, Abbottabad, Pakistan
[2] COMSATS Univ Islamabad, Dept Comp Sci, Attock Campus, Attock, Pakistan
[3] COMSATS Univ Islamabad, Dept Elect & Comp Engn, Abbottabad Campus, Abbottabad, Pakistan
[4] Jashore Univ Sci & Technol, Dept Math, Jashore, Bangladesh
关键词
Urdu; Text detection; MSER; LOCALIZATION; RECOGNITION;
D O I
10.7717/peerj-cs.717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text detection in natural scene images for content analysis is an interesting task. The research community has seen some great developments for English/Mandarin text detection. However, Urdu text extraction in natural scene images is a task not well addressed. In this work, firstly, a new dataset is introduced for Urdu text in natural scene images. The dataset comprises of 500 standalone images acquired from real scenes. Secondly, the channel enhanced Maximally Stable Extremal Region (MSER) method is applied to extract Urdu text regions as candidates in an image. Two-stage filtering mechanism is applied to eliminate non-candidate regions. In the first stage, text and noise are classified based on their geometric properties. In the second stage, a support vector machine classifier is trained to discard non-text candidate regions. After this, text candidate regions are linked using centroid-based vertical and horizontal distances. Text lines are further analyzed by a different classifier based on HOG features to remove non-text regions. Extensive experimentation is performed on the locally developed dataset to evaluate the performance. The experimental results show good performance on test set images. The dataset will be made available for research use. To the best of our knowledge, the work is the first of its kind for the Urdu language and would provide a good dataset for free research use and serve as a baseline performance on the task of Urdu text extraction.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Urdu text in natural scene images: a new dataset and preliminary text detection
    Ali, Hazrat
    Iqbal, Khalid
    Mujtaba, Ghulam
    Fayyaz, Ahmad
    Bulbul, Mohammad Farhad
    Karam, Fazal Wahab
    Zahir, Ali
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 17
  • [2] A Database for Urdu Text Detection and Recognition in Natural Scene Images
    Chandio, Asghar Ali
    Leghari, Mehwish
    Memon, Mukhtiar Ahmed
    Leghari, Mehjabeen
    Jalbani, Akhtar Hussain
    [J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2020, 39 (01) : 47 - 54
  • [3] Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images
    Chandio, Asghar Ali
    Asikuzzamana, Md.
    Pickering, Mark
    Leghari, Mehwish
    [J]. DATA IN BRIEF, 2020, 31
  • [4] Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning
    Arafat, Syed Yasser
    Iqbal, Muhammad Javed
    [J]. IEEE ACCESS, 2020, 8 : 96787 - 96803
  • [5] A New Method for Arabic Text Detection in Natural Scene Images
    Gaddour, Houda
    Kanoun, Slim
    Vincent, Nicole
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
  • [6] Text Detection and Recognition in Natural Scene Images
    Huang, Xiaoming
    Shen, Tao
    Wang, Run
    Gao, Chenqiang
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 44 - 49
  • [7] Robust Text Detection in Natural Scene Images
    Yin, Xu-Cheng
    Yin, Xuwang
    Huang, Kaizhu
    Hao, Hong-Wei
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (05) : 970 - 983
  • [8] Uyghur Text Detection in Natural Scene Images
    Li, Xinming
    Li, Junfang
    Gao, Qiag
    Yu, Xiao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (ICMA), 2019, : 1542 - 1547
  • [9] Text Detection and Recognition in Natural Scene Images
    Pise, Amruta
    Ruikar, S. D.
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2014,
  • [10] Scene Text Detection in Natural Images: A Review
    Cao, Dongping
    Zhong, Yong
    Wang, Lishun
    He, Yilong
    Dang, Jiachen
    [J]. SYMMETRY-BASEL, 2020, 12 (12): : 1 - 26