Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)

被引:1
|
作者
Cheema, Musa Dildar Ahmed [1 ]
Shaiq, Mohammad Daniyal [1 ]
Mirza, Farhaan [2 ]
Kamal, Ali [1 ]
Naeem, M. Asif [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Dept Artificial Intelligence & Data Sci, Islamabad, Pakistan
[2] Auckland Univ Technol, Sch Comp Engn & Math Sci, Auckland, New Zealand
关键词
Document analysis; OCR; Urdu OCR; Multilingual; Transformer based models; Performance evaluation;
D O I
10.7717/peerj-cs.1964
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language models to achieve superior performances. The proposed approach is evaluated using the character error rate (CER) metric and achieves stateof-the-art results on the Urdu UHWR dataset, with a CER of 1.1%. The experimental results demonstrate the effectiveness of the proposed approach, surpassing state of the -art baselines in Urdu handwriting digitization.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [32] A General Procedure for Improving Language Models in Low-Resource Speech Recognition
    Liu, Qian
    Zhang, Wei-Qiang
    Liu, Jia
    Liu, Yao
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 428 - 433
  • [33] Language-universal phonetic encoder for low-resource speech recognition
    Feng, Siyuan
    Tu, Ming
    Xia, Rui
    Huang, Chuanzeng
    Wang, Yuxuan
    INTERSPEECH 2023, 2023, : 1429 - 1433
  • [34] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [35] Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion
    Farooq, Muhammad Umar
    Narayana, Darshan Adiga Haniya
    Hain, Thomas
    INTERSPEECH 2022, 2022, : 4850 - 4854
  • [36] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
    Dhananjaya, Vinura
    Ranathunga, Surangika
    Jayasena, Sanath
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (05) : 1116 - 1125
  • [37] Optimizing Multilingual Sentiment Analysis in Low-Resource Languages with Adaptive Pretraining and Strategic Language Selection
    Raychawdhary, Nilanjana
    Das, Amit
    Bhattacharya, Sutanu
    Dozier, Gerry
    Seals, Cheryl D.
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [38] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Joyanta Basu
    Soma Khan
    Rajib Roy
    Tapan Kumar Basu
    Swanirbhar Majumder
    Circuits, Systems, and Signal Processing, 2021, 40 : 4986 - 5013
  • [39] A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages
    Zennaki, O.
    Semmar, N.
    Besacier, L.
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 43 - 67
  • [40] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Basu, Joyanta
    Khan, Soma
    Roy, Rajib
    Basu, Tapan Kumar
    Majumder, Swanirbhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013