Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
下载
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [1] BRIDGING THE NATURAL-LANGUAGE GAP
    WIEDERHOLD, G
    IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1987, 2 (02): : 56 - 56
  • [2] A state of the art of natural language processing of the Tunisian dialect
    Younes, Jihene
    Souissi, Emna
    Achour, Hadhemi
    Ferchichi, Ahmed
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2018, 59 (03): : 93 - 117
  • [3] On Natural Language Processing Applications for Military Dialect Classification
    Gunasekara, Charith
    Carryer, Tobias
    Triff, Matt
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 211 - 218
  • [4] Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems
    Ludusan, Bogdan
    Versteegh, Maarten
    Jansen, Aren
    Gravier, Guillaume
    Cao, Xuan-Nga
    Johnson, Mark
    Dupoux, Emmanuel
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 560 - 567
  • [5] Bridging the Gap between the Meaning of Natural Language and Mental Content
    Huo, Shuquan
    INTEGRATIVE PSYCHOLOGICAL AND BEHAVIORAL SCIENCE, 2022, 56 (01) : 163 - 179
  • [6] Bridging the Gap between the Meaning of Natural Language and Mental Content
    Shuquan Huo
    Integrative Psychological and Behavioral Science, 2022, 56 : 163 - 179
  • [7] BRIDGING THE LANGUAGE GAP
    冯翠华
    教学研究, 1980, (01) : 3 - 17+27
  • [8] ContractFrames: Bridging the Gap Between Natural Language and Logics in Contract Law
    Navas-Loro, Maria
    Satoh, Ken
    Rodriguez-Doncel, Victor
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE (JSAI-ISAI 2018), 2019, 11717 : 101 - 114
  • [9] GeoNLU: Bridging the gap between natural language and spatial data infrastructures
    Naveen, Palanichamy
    Maheswar, Rajagopal
    Trojovsky, Pavel
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 87 : 126 - 147
  • [10] Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
    Fernandes, Patrick
    Madaan, Aman
    Liu, Emmy
    Farinhas, Antonio
    Martins, Pedro Henrique
    Bertsch, Amanda
    de Souza, Jose G. C.
    Zhou, Shuyan
    Wu, Tongshuang
    Neubig, Graham
    Martins, Andre F. T.
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1643 - 1668