Named Entity Recognition and Classification for Punjabi Shahmukhi

被引:13
|
作者
Ahmad, Muhammad Tayyab [1 ,2 ]
Malik, Muhammad Kamran [1 ,2 ]
Shahzad, Khurram [1 ,2 ]
Aslam, Faisal [1 ,2 ]
Iqbal, Asif [1 ,2 ]
Nawaz, Zubair [1 ,2 ]
Bukhari, Faisal [1 ,2 ]
机构
[1] Punjab Univ Coll Informat Technol, Lahore, Pakistan
[2] Univ Punjab, Punjab Univ Coll Informat Technol, New Campus, Lahore, Pakistan
关键词
Low-resource languages; Asian languages; Punjabi; Shahmukhi; named entity recognition;
D O I
10.1145/3383306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for bothWestern and Asian languages. Even though Shahmukhi script of the Punjabi language has been used by nearly three fourths of the Punjabi speakers worldwide, Gurmukhi has been the main focus of research activities. Specifically, a benchmark NER corpus for Shahmukhi is non-existent, which has thwarted the commencement of NER research for the Shahmukhi script. To this end, this article presents the development and specifications of the first-ever NER corpus for Shahmukhi. The newly developed corpus is composed of 318,275 tokens and 16,300 named entities, including 11,147 persons, 3,140 locations, and 2,013 organizations. To establish the strength of our corpus, we have compared the specifications of our corpus with its Gurmukhi counterparts. Furthermore, we have demonstrated the usability of our corpus using five supervised learning techniques, including two state-of-the-art deep learning techniques. The results are compared, and valuable insights about the behaviors of the most effective technique are discussed.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Shahmukhi named entity recognition by using contextualized word embeddings
    Tehseen, Amina
    Ehsan, Toqeer
    Bin Liaqat, Hannan
    Kong, Xiangjie
    Ali, Amjad
    Al-Fuqaha, Ala
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [2] Named entity Recognition Model for Punjabi Language: A Survey
    Kaur, Pawandeep
    Kaur, Amandeep
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 887 - 891
  • [3] Named Entity Recognition and Classification in Galician
    Garcia, Marcos
    Gayo, Iria
    Gonzalez Lopez, Isaac
    [J]. ESTUDOS DE LINGUISTICA GALEGA, 2012, 4 : 13 - 25
  • [4] A survey of named entity recognition and classification
    Nadeau, David
    Sekine, Satoshi
    [J]. LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 3 - 26
  • [5] Named Entity Recognition as Graph Classification
    Harrando, Ismail
    Troncy, Raphael
    [J]. SEMANTIC WEB: ESWC 2021 SATELLITE EVENTS, 2021, 12739 : 103 - 108
  • [6] Named Entity Recognition Datasets: A Classification Framework
    Zhang, Ying
    Xiao, Gang
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [7] Named Entity Recognition Datasets: A Classification Framework
    Ying Zhang
    Gang Xiao
    [J]. International Journal of Computational Intelligence Systems, 17
  • [8] Named entity recognition and classification for text in arabic
    Abuleil, S
    Evens, M
    [J]. INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2004, : 89 - 94
  • [9] Named Entity Recognition and Classification for Gujarati Language
    Vora, Komil
    Vasant, Avani
    Adhvaryu, Rachit
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2269 - 2272
  • [10] Named Entity Recognition and Classification for Medical Prospectuses
    Chirila, Oana Sorina
    Chirila, Ciprian-Bogdan
    Stoicu-Tivadar, Lacramioara
    [J]. HEALTH INFORMATICS VISION: FROM DATA VIA INFORMATION TO KNOWLEDGE, 2019, 262 : 284 - 287