Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
下载
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [11] LinguApp: Bridging the Language Gap
    Gomez Parra, Ma Elena
    PROCEEDINGS OF THE 11TH INNOVATION IN LANGUAGE LEARNING INTERNATIONAL CONFERENCE, 2018, : 133 - 135
  • [12] Tunisian Dialect Sentiment Analysis: A Natural Language Processing-based Approach
    Mulki, Hala
    Haddad, Hatem
    Ali, Chedi Bechikh
    Babaoglu, Ismail
    COMPUTACION Y SISTEMAS, 2018, 22 (04): : 1223 - 1232
  • [13] A case of bridging the language gap
    Jones, Ashley
    ECONTENT, 2008, 31 (06) : 44 - 46
  • [14] TIIARA: A Language Tool for Bridging the Language Gap
    Khashman, Nouf
    Menard, Elaine
    Dorey, Jonathan
    DESIGN, USER EXPERIENCE, AND USABILITY: NOVEL USER EXPERIENCES, PT II, 2016, 9747 : 386 - 395
  • [15] Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
    Baik, Christopher
    Jagadish, H. V.
    Li, Yunyao
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 374 - 385
  • [16] Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect
    Jbel, Mouad
    Jabrane, Mourad
    Hafidi, Imad
    Metrane, Abdulmutallib
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [17] Processing natural language without natural language processing
    Brill, E
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 360 - 369
  • [18] Bridging the gap between language and action
    Takenobu, T
    Tomofumi, K
    Suguru, S
    Manabu, O
    INTELLIGENT VIRTUAL AGENTS, 2003, 2792 : 127 - 135
  • [19] Invited Forum: Bridging the "Language Gap"
    Avineri, Netta
    Johnson, Eric
    Brice-Heath, Shirley
    McCarty, Teresa
    Ochs, Elinor
    Kremer-Sadlik, Tamar
    Blum, Susan
    Zentella, Ana Celia
    Rosa, Jonathan
    Flores, Nelson
    Alim, H. Samy
    Paris, Django
    JOURNAL OF LINGUISTIC ANTHROPOLOGY, 2015, 25 (01) : 66 - 86
  • [20] Bridging the language gap for simulation resources
    Reeves, Andrew
    Auerbach, Marc
    Kou, Maybelle
    Sanseau, Elizabeth
    Hamann, Magnus
    Roland, Damian
    BMJ SIMULATION & TECHNOLOGY ENHANCED LEARNING, 2021, 7 (05): : 444 - 446