Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)

被引：1

作者：

Cheema, Musa Dildar Ahmed ^{[1
]}

Shaiq, Mohammad Daniyal ^{[1
]}

Mirza, Farhaan ^{[2
]}

Kamal, Ali ^{[1
]}

Naeem, M. Asif ^{[1
]}

机构：

[1] Natl Univ Comp & Emerging Sci, Dept Artificial Intelligence & Data Sci, Islamabad, Pakistan

[2] Auckland Univ Technol, Sch Comp Engn & Math Sci, Auckland, New Zealand

来源：

PEERJ COMPUTER SCIENCE | 2024年 / 10卷

关键词：

Document analysis; OCR; Urdu OCR; Multilingual; Transformer based models; Performance evaluation;

D O I：

10.7717/peerj-cs.1964

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language models to achieve superior performances. The proposed approach is evaluated using the character error rate (CER) metric and achieves stateof-the-art results on the Urdu UHWR dataset, with a CER of 1.1%. The experimental results demonstrate the effectiveness of the proposed approach, surpassing state of the -art baselines in Urdu handwriting digitization.

引用

页数：24

共 50 条

[31] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
Xu, Ping
Fung, Pascale
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
[32] A General Procedure for Improving Language Models in Low-Resource Speech Recognition
Liu, Qian
Zhang, Wei-Qiang
Liu, Jia
Liu, Yao
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 428 - 433
[33] Language-universal phonetic encoder for low-resource speech recognition
Feng, Siyuan
Tu, Ming
Xia, Rui
Huang, Chuanzeng
Wang, Yuxuan
INTERSPEECH 2023, 2023, : 1429 - 1433
[34] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
Yi, Jiangyan
Tao, Jianhua
Wen, Zhengqi
Bai, Ye
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
[35] Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion
Farooq, Muhammad Umar
Narayana, Darshan Adiga Haniya
Hain, Thomas
INTERSPEECH 2022, 2022, : 4850 - 4854
[36] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
Dhananjaya, Vinura
Ranathunga, Surangika
Jayasena, Sanath
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (05) : 1116 - 1125
[37] Optimizing Multilingual Sentiment Analysis in Low-Resource Languages with Adaptive Pretraining and Strategic Language Selection
Raychawdhary, Nilanjana
Das, Amit
Bhattacharya, Sutanu
Dozier, Gerry
Seals, Cheryl D.
2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
[38] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
Joyanta Basu
Soma Khan
Rajib Roy
Tapan Kumar Basu
Swanirbhar Majumder
Circuits, Systems, and Signal Processing, 2021, 40 : 4986 - 5013
[39] A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages
Zennaki, O.
Semmar, N.
Besacier, L.
NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 43 - 67
[40] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
Basu, Joyanta
Khan, Soma
Roy, Rajib
Basu, Tapan Kumar
Majumder, Swanirbhar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013

← 1 2 3 4 5 →