A deep learning approach for Named Entity Recognition in Urdu language

被引:0
|
作者
Anam, Rimsha [1 ]
Waqas Anwar, Muhammad [1 ,2 ]
Hasan Jamal, Muhammad [1 ]
Ijaz Bajwa, Usama [1 ]
de la Torre Diez, Isabel [3 ]
Silva Alvarado, Eduardo [4 ,5 ,6 ]
Soriano Flores, Emmanuel [4 ,7 ,8 ]
Ashraf, Imran [9 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore, Pakistan
[2] Govt Coll Univ, Dept Comp Sci, Lahore, Pakistan
[3] Univ Valladolid, Dept Signal Theory Commun & Telemat Engn, Valladolid, Spain
[4] Univ Europea Atlantico, Santander, Spain
[5] Univ Int Iberoamer Arecibo, Arecibo, PR USA
[6] Univ Int Cuanza, Cuito, Bie, Angola
[7] Univ Int Iberoamericana Campeche, Mexico City, DF, Mexico
[8] Fdn Univ Int Colombia Bogota, Bogota, Colombia
[9] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan, South Korea
来源
PLOS ONE | 2024年 / 19卷 / 03期
关键词
EXTRACTION;
D O I
10.1371/journal.pone.0300725
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Urdu Named Entity Recognition System Using Deep Learning Approaches
    Haq, Rafiul
    Zhang, Xiaowang
    Khan, Wahab
    Feng, Zhiyong
    [J]. COMPUTER JOURNAL, 2023, 66 (08): : 1856 - 1869
  • [2] Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications
    Kanwal, Safia
    Malik, Kamran
    Shahzad, Khurram
    Aslam, Faisal
    Nawaz, Zubair
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [3] Big Data and Named Entity Recognition Approaches for Urdu Language
    Jamil, Qudsia
    Zafar, Muhammad Rehman
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2018, 4 (16):
  • [4] Deep Learning Approach for Arabic Named Entity Recognition
    Gridach, Mourad
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 439 - 451
  • [5] Deep learning with language models improves named entity recognition for PharmaCoNER
    Cong Sun
    Zhihao Yang
    Lei Wang
    Yin Zhang
    Hongfei Lin
    Jian Wang
    [J]. BMC Bioinformatics, 22
  • [6] Deep learning with language models improves named entity recognition for PharmaCoNER
    Sun, Cong
    Yang, Zhihao
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    [J]. BMC BIOINFORMATICS, 2021, 22 (SUPPL 1)
  • [7] A Survey on Deep Learning for Named Entity Recognition
    Li, Jing
    Sun, Aixin
    Han, Jianglei
    Li, Chenliang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 50 - 70
  • [8] Turkish Named Entity Recognition with Deep Learning
    Gunes, Asim
    Tantug, A. Cuneyd
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [9] Named entity recognition based on deep learning
    Ji, Zhenyan
    Kong, Deyan
    Liu, Wei
    Dong, Wei
    Sang, Yanjuan
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (06): : 1603 - 1615
  • [10] Deep learning for named entity recognition: a survey
    Hu Z.
    Hou W.
    Liu X.
    [J]. Neural Comput. Appl., 16 (8995-9022): : 8995 - 9022