Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)

被引:1
|
作者
Cheema, Musa Dildar Ahmed [1 ]
Shaiq, Mohammad Daniyal [1 ]
Mirza, Farhaan [2 ]
Kamal, Ali [1 ]
Naeem, M. Asif [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Dept Artificial Intelligence & Data Sci, Islamabad, Pakistan
[2] Auckland Univ Technol, Sch Comp Engn & Math Sci, Auckland, New Zealand
关键词
Document analysis; OCR; Urdu OCR; Multilingual; Transformer based models; Performance evaluation;
D O I
10.7717/peerj-cs.1964
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language models to achieve superior performances. The proposed approach is evaluated using the character error rate (CER) metric and achieves stateof-the-art results on the Urdu UHWR dataset, with a CER of 1.1%. The experimental results demonstrate the effectiveness of the proposed approach, surpassing state of the -art baselines in Urdu handwriting digitization.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)
    Cheema M.D.A.
    Shaiq M.D.
    Mirza F.
    Kamal A.
    Naeem M.A.
    PeerJ Computer Science, 2024, 10 : 1 - 24
  • [2] Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
    Naz, Saeeda
    Umar, Arif Iqbal
    Razzak, Muhammad Imran
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2016, 35 (02) : 209 - 216
  • [3] Automatic Labeling of Clusters for a Low-Resource Urdu Language
    Nasim, Zarmeen
    Haider, Sajjad
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [4] Character Profiling in Low-Resource Language Documents
    Wong, Tak-sum
    Lee, John
    ADCS 2019: PROCEEDINGS OF THE 24TH AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM, 2019,
  • [5] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [6] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [7] Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
    Feng, Siyuan
    Tu, Ming
    Xia, Rui
    Huang, Chuanzeng
    Wang, Yuxuan
    INTERSPEECH 2023, 2023, : 1384 - 1388
  • [8] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [9] Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
    Xiao, Yubei
    Gong, Ke
    Zhou, Pan
    Zheng, Guolin
    Liang, Xiaodan
    Lin, Liang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14112 - 14120
  • [10] Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters
    German Research Center for Artificial Intelligence , Germany
    不详
    KaLLM - Workshop Knowl. Graphs Large Lang. Model., Proc. Workshop, (63-74):