Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications

被引:37
|
作者
Kanwal, Safia [1 ]
Malik, Kamran [1 ]
Shahzad, Khurram [1 ]
Aslam, Faisal [1 ]
Nawaz, Zubair [1 ]
机构
[1] Univ Punjab, Coll Informat Technol, Old Campus, Lahore, Pakistan
关键词
Resource poor languages; deep learning; Urdu NER corpus; Word2vec; fastText; word embeddings;
D O I
10.1145/3329710
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition (NER) plays a pivotal role in various natural language processing tasks, such as machine translation and automatic question-answering systems. Recognizing the importance of NER, a plethora of NER techniques for Western and Asian languages have been developed. However, despite having over 490 million Urdu language speakers worldwide, NER resources for Urdu are either non-existent or inadequate. To fill this gap, this article makes four key contributions. First, we have developed the largest Urdu NER corpus, which contains 926,776 tokens and 99,718 carefully annotated NEs. The developed corpus has at least doubled the number of manually tagged NEs as compared to any of the existing Urdu NER corpora. Second, we have generated six new word embeddings using three different techniques, fastText, Word2vec, and Glove, on two corpora of Urdu text. These are the only publicly available embeddings for the Urdu language, besides the recently released Urdu word embeddings by Facebook. Third, we have pioneered in the application of deep learning techniques, NN and RNN, for Urdu named entity recognition. Finally, we have performed 10-folds of 32 different experiments using the combinations of a traditional supervised learning and deep learning techniques, seven types of word embeddings, and two different Urdu NER datasets. Based on the analysis of the results, several valuable insights are provided about the effectiveness of deep learning techniques, the impact of word embeddings, and variations of datasets.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A deep learning approach for Named Entity Recognition in Urdu language
    Anam, Rimsha
    Waqas Anwar, Muhammad
    Hasan Jamal, Muhammad
    Ijaz Bajwa, Usama
    de la Torre Diez, Isabel
    Silva Alvarado, Eduardo
    Soriano Flores, Emmanuel
    Ashraf, Imran
    [J]. PLOS ONE, 2024, 19 (03):
  • [2] Urdu Named Entity Recognition System Using Deep Learning Approaches
    Haq, Rafiul
    Zhang, Xiaowang
    Khan, Wahab
    Feng, Zhiyong
    [J]. COMPUTER JOURNAL, 2023, 66 (08): : 1856 - 1869
  • [3] A Survey on Deep Learning for Named Entity Recognition
    Li, Jing
    Sun, Aixin
    Han, Jianglei
    Li, Chenliang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 50 - 70
  • [4] Turkish Named Entity Recognition with Deep Learning
    Gunes, Asim
    Tantug, A. Cuneyd
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [5] Named entity recognition based on deep learning
    Ji Z.
    Kong D.
    Liu W.
    Dong W.
    Sang Y.
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (06): : 1603 - 1615
  • [6] Deep learning for named entity recognition: a survey
    Hu Z.
    Hou W.
    Liu X.
    [J]. Neural Comput. Appl., 16 (8995-9022): : 8995 - 9022
  • [7] A Deep Learning Solution to Named Entity Recognition
    Murthy, V. Rudra
    Bhattacharyya, Pushpak
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 427 - 438
  • [8] Deep recurrent neural networks with word embeddings for Urdu named entity recognition
    Khan, Wahab
    Daud, Ali
    Alotaibi, Fahd
    Aljohani, Naif
    Arafat, Sachi
    [J]. ETRI JOURNAL, 2020, 42 (01) : 90 - 100
  • [9] Named Entity Recognition for Amharic Using Deep Learning
    Gamback, Bjorn
    Sikdar, Utpal Kumar
    [J]. 2017 IST-AFRICA WEEK CONFERENCE (IST-AFRICA), 2017,
  • [10] Survey on Chinese named entity recognition with deep learning
    Kang Y.
    Sun L.
    Zhu R.
    Li M.
    [J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50 (11): : 44 - 53