A deep learning approach for Named Entity Recognition in Urdu language

被引:1
|
作者
Anam, Rimsha [1 ]
Waqas Anwar, Muhammad [1 ,2 ]
Hasan Jamal, Muhammad [1 ]
Ijaz Bajwa, Usama [1 ]
de la Torre Diez, Isabel [3 ]
Silva Alvarado, Eduardo [4 ,5 ,6 ]
Soriano Flores, Emmanuel [4 ,7 ,8 ]
Ashraf, Imran [9 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore, Pakistan
[2] Govt Coll Univ, Dept Comp Sci, Lahore, Pakistan
[3] Univ Valladolid, Dept Signal Theory Commun & Telemat Engn, Valladolid, Spain
[4] Univ Europea Atlantico, Santander, Spain
[5] Univ Int Iberoamer Arecibo, Arecibo, PR USA
[6] Univ Int Cuanza, Cuito, Bie, Angola
[7] Univ Int Iberoamericana Campeche, Mexico City, DF, Mexico
[8] Fdn Univ Int Colombia Bogota, Bogota, Colombia
[9] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan, South Korea
来源
PLOS ONE | 2024年 / 19卷 / 03期
关键词
EXTRACTION;
D O I
10.1371/journal.pone.0300725
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] DeepSpacy-NER: an efficient deep learning model for named entity recognition for Punjabi language
    Navdeep Singh
    Munish Kumar
    Bavalpreet Singh
    Jaskaran Singh
    Evolving Systems, 2023, 14 : 673 - 683
  • [22] DeepSpacy-NER: an efficient deep learning model for named entity recognition for Punjabi language
    Singh, Navdeep
    Kumar, Munish
    Singh, Bavalpreet
    Singh, Jaskaran
    EVOLVING SYSTEMS, 2023, 14 (04) : 673 - 683
  • [23] Named Entity Recognition for Malayalam Language: A CRF based Approach
    Prasad, Gowri
    Fousiya, K. K.
    Kumar, M. Anand
    Soman, K. P.
    2015 INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND MANAGEMENT FOR COMPUTING, COMMUNICATION, CONTROLS, ENERGY AND MATERIALS (ICSTM), 2015, : 16 - 19
  • [24] A Language Independent Approach for Named Entity Recognition in Subject Headings
    Freire, Nuno
    Borbinha, Jose
    Calado, Pavel
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, TPDL 2011, 2011, 6966 : 52 - 61
  • [25] A Hybrid Deep Learning Framework for Bacterial Named Entity Recognition
    Li, Xusheng
    Wang, Xiaoyan
    Zhong, Ran
    Zhong, Duo
    He, Tingting
    Hu, Xiaohua
    Jiang, Xingpeng
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 428 - 433
  • [26] A deep learning method for named entity recognition in bidding document
    Ji, Yunfei
    Tong, Chao
    Liang, Jun
    Yang, Xi
    Zhao, Zheng
    Wang, Xu
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
  • [27] Military Named Entity Recognition Method Based on Deep Learning
    Wang, Xuefeng
    Yang, Ruopeng
    Lu, Yiwei
    Wu, Qingfeng
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 479 - 483
  • [28] Bengali Named Entity Recognition: A survey with deep learning benchmark
    Rifat, Md Jamiur Rahman
    Abujar, Sheikh
    Noori, Sheak Rashed Haider
    Hossain, Syed Akhter
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [29] Subsequence Based Deep Active Learning for Named Entity Recognition
    Radmard, Puria
    Fathullah, Yassir
    Lipani, Aldo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4310 - 4321
  • [30] Research Progress on Named Entity Recognition in Chinese Deep Learning
    Li, Li
    Xi, Xuefeng
    Sheng, Shengli
    Cui, Zhiming
    Xu, Jiabao
    Computer Engineering and Applications, 2023, 59 (24) : 46 - 69