Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications

被引:37
|
作者
Kanwal, Safia [1 ]
Malik, Kamran [1 ]
Shahzad, Khurram [1 ]
Aslam, Faisal [1 ]
Nawaz, Zubair [1 ]
机构
[1] Univ Punjab, Coll Informat Technol, Old Campus, Lahore, Pakistan
关键词
Resource poor languages; deep learning; Urdu NER corpus; Word2vec; fastText; word embeddings;
D O I
10.1145/3329710
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition (NER) plays a pivotal role in various natural language processing tasks, such as machine translation and automatic question-answering systems. Recognizing the importance of NER, a plethora of NER techniques for Western and Asian languages have been developed. However, despite having over 490 million Urdu language speakers worldwide, NER resources for Urdu are either non-existent or inadequate. To fill this gap, this article makes four key contributions. First, we have developed the largest Urdu NER corpus, which contains 926,776 tokens and 99,718 carefully annotated NEs. The developed corpus has at least doubled the number of manually tagged NEs as compared to any of the existing Urdu NER corpora. Second, we have generated six new word embeddings using three different techniques, fastText, Word2vec, and Glove, on two corpora of Urdu text. These are the only publicly available embeddings for the Urdu language, besides the recently released Urdu word embeddings by Facebook. Third, we have pioneered in the application of deep learning techniques, NN and RNN, for Urdu named entity recognition. Finally, we have performed 10-folds of 32 different experiments using the combinations of a traditional supervised learning and deep learning techniques, seven types of word embeddings, and two different Urdu NER datasets. Based on the analysis of the results, several valuable insights are provided about the effectiveness of deep learning techniques, the impact of word embeddings, and variations of datasets.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Arabic named entity recognition via deep co-learning
    Helwe, Chadi
    Elbassuoni, Shady
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) : 197 - 215
  • [32] Deep learning with word embeddings improves biomedical named entity recognition
    Habibi, Maryam
    Weber, Leon
    Neves, Mariana
    Wiegandt, David Luis
    Leser, Ulf
    [J]. BIOINFORMATICS, 2017, 33 (14) : I37 - I48
  • [33] Transfer Learning for Arabic Named Entity Recognition With Deep Neural Networks
    Al-Smadi, Mohammad
    Al-Zboon, Saad
    Jararweh, Yaser
    Juola, Patrick
    [J]. IEEE ACCESS, 2020, 8 (37736-37745) : 37736 - 37745
  • [34] Enhancing Deep Learning with Embedded Features for Arabic Named Entity Recognition
    Lotfy, Ali
    Sabty, Caroline
    Abdennadher, Slim
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4904 - 4912
  • [35] Deep learning with language models improves named entity recognition for PharmaCoNER
    Cong Sun
    Zhihao Yang
    Lei Wang
    Yin Zhang
    Hongfei Lin
    Jian Wang
    [J]. BMC Bioinformatics, 22
  • [36] Deep learning with language models improves named entity recognition for PharmaCoNER
    Sun, Cong
    Yang, Zhihao
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    [J]. BMC BIOINFORMATICS, 2021, 22 (SUPPL 1)
  • [37] Named Entity Recognition in Threat Intelligence Domain Based on Deep Learning
    Wang Y.
    Wang Z.-H.
    Li H.
    Huang W.-J.
    [J]. Dongbei Daxue Xuebao/Journal of Northeastern University, 2023, 44 (01): : 33 - 39
  • [38] A Deep Learning-Based Named Entity Recognition in Biomedical Domain
    Gopalakrishnan, Athira
    Soman, K. P.
    Premjith, B.
    [J]. EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 517 - 526
  • [39] A Survey of Deep Learning for Named Entity Recognition in Chinese Social Media
    Liu, Jingxin
    Cheng, Jieren
    Wang, Ziyan
    Lou, Congqiang
    Shen, Chenli
    Sheng, Victor S.
    [J]. ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I, 2022, 13338 : 573 - 582
  • [40] Named Entity Recognition through Deep Representation Learning and Weak Supervision
    Parker, Jerrod
    Yu, Shi
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3828 - 3839