Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
下载
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [41] The Gender Gap Tracker: Using Natural Language Processing to measure gender bias in media
    Asr, Fatemeh Torabi
    Mazraeh, Mohammad
    Lopes, Alexandre
    Gautam, Vasundhara
    Gonzales, Junette
    Rao, Prashanth
    Taboada, Maite
    PLOS ONE, 2021, 16 (01):
  • [42] Spoken Language Processing Model: Bridging Auditory and Language Processing to Guide Assessment and Intervention
    Medwetsky, Larry
    LANGUAGE SPEECH AND HEARING SERVICES IN SCHOOLS, 2011, 42 (03) : 286 - 296
  • [43] NEW TRENDS IN NATURAL-LANGUAGE PROCESSING - STATISTICAL NATURAL-LANGUAGE PROCESSING
    MARCUS, M
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) : 10052 - 10059
  • [44] Introduction to Chinese Natural Language Processing (Review of Introduction to Chinese Natural Language Processing)
    Jiang Song
    JOURNAL OF TECHNOLOGY AND CHINESE LANGUAGE TEACHING, 2010, 1 (01): : 94 - 98
  • [45] Natural Arabic Language Resources for Emotion Recognition in Algerian Dialect
    Dahmani, Habiba
    Hussein, Hussein
    Meyer-Sickendiek, Burkhard
    Jokisch, Oliver
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2019, 2019, 1108 : 18 - 33
  • [46] NLP (Natural Language Processing) for NLP (Natural Language Programming)
    Mihalcea, R
    Liu, H
    Lieberman, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 319 - 330
  • [47] Can Natural Language Processing Become Natural Language Coaching?
    Hearst, Marti A.
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 1245 - 1252
  • [48] Bridging the gap between implicit and explicit understanding: How language development promotes the processing and representation of false belief
    San Juan, Valerie
    Astington, Janet Wilde
    BRITISH JOURNAL OF DEVELOPMENTAL PSYCHOLOGY, 2012, 30 (01) : 105 - 122
  • [49] Bridging the gap between artificial intelligence and natural intelligence
    Zhu, Rui-Jie
    Gunasekaran, Skye
    Eshraghian, Jason
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 559 - 560
  • [50] Bridging the gap between non-symbolic and symbolic processing - How could human being acquire language?
    Ohsuga, Setsuo
    FUNDAMENTA INFORMATICAE, 2007, 75 (1-4) : 385 - 406