Bridging the Kuwaiti Dialect Gap in Natural Language Processing

被引:0
|
作者
Husain, Fatemah [1 ]
Alostad, Hana [2 ]
Omar, Halima [3 ]
机构
[1] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Informat Sci Dept, Safat 13060, Kuwait
[2] Gulf Univ Sci & Technol, Coll Arts & Sci, Comp Sci Dept, Hawally 32093, Kuwait
[3] Kuwait Univ, Sabah AlSalem Univ City Alshadadiya, Coll Life Sci, Commun Disorders Sci Dept, Safat 13060, Kuwait
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Natural language processing; Sentiment analysis; Labeling; Linguistics; Annotations; Cleaning; Text categorization; Zero-shot learning; Machine learning; weak supervision; zero-shot language model; sentiment analysis; Arabic language; machine learning; Kuwaiti dialect;
D O I
10.1109/ACCESS.2024.3364367
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The available dialectal Arabic linguistic resources are very limited in their coverage of Arabic dialects, particularly the Kuwaiti dialect. This shortage of linguistic resources creates struggles for researchers in the Natural Language Processing (NLP) field and limits the development of advanced linguistic analytical and processing tools for the Kuwaiti dialect. Many other low-resource Arabic dialects are still not explored in research due to the challenges faced during the annotators' recruitment process for dataset labeling. This paper proposes a weak supervised classification system to solve the problem of recruiting human annotators called "q8SentiLabeler". In addition, we developed a large dataset consisting of over 16.6k posts serving sentiment analysis in the Kuwaiti dialect. This dataset covers several themes and timeframes to remove any bias that might affect its content. Furthermore, we evaluated our dataset using multiple traditional machine-learning classifiers and advanced deep-learning language models to test its performance. Results demonstrate the positive potential of "q8SentiLabeler" to replace human annotators with a 93% for pairwise percent agreement and 0.87 for Cohen's Kappa coefficient. Using the ARBERT model on our dataset, we achieved 89% accuracy in the system's performance.
引用
收藏
页码:27709 / 27722
页数:14
相关论文
共 50 条
  • [21] Bridging Enterprise Knowledge Management and Natural Language Processing - Integration Framework and a Prototype
    Cappel, Justus
    Chasin, Friedrich
    DESIGN SCIENCE RESEARCH FOR A RESILIENT FUTURE, DESRIST 2024, 2024, 14621 : 278 - 294
  • [22] Bridging information gaps in menopause status classification through natural language processing
    Eyre, Hannah
    Alba, Patrick R.
    Gibson, Carolyn J.
    Gatsby, Elise
    Lynch, Kristine E.
    Patterson, Olga, V
    DuVall, Scott L.
    JAMIA OPEN, 2024, 7 (01)
  • [23] LANGUAGE COUNSELLING: BRIDGING THE GAP BETWEEN CODIFICATION AND LANGUAGE USE
    Oslak, Urska Vranjek
    EESTI JA SOOME-UGRI KEELETEADUSE AJAKIRI-JOURNAL OF ESTONIAN AND FINNO-UGRIC LINGUISTICS, 2023, 14 (01): : 149 - 173
  • [24] Bridging auditory perception and natural language processing with semantically informed deep neural networks
    Michele Esposito
    Giancarlo Valente
    Yenisel Plasencia-Calaña
    Michel Dumontier
    Bruno L. Giordano
    Elia Formisano
    Scientific Reports, 14 (1)
  • [25] Bridging Natural Language Processing AI techniques and Corporate Communications: towards an integrative model
    Pinter, Daniel Gergo
    Ihasz, Peter Lajos
    INFORMACIOS TARSADALOM, 2019, 19 (04): : 77 - 99
  • [26] TRAVERSING THE TRANSDIAGNOSTIC GAP BETWEEN DEPRESSION, MANIA AND PSYCHOSIS WITH NATURAL LANGUAGE PROCESSING
    Patel, Rashmi
    Irving, Jessica
    Taylor, Matthew
    Shetty, Hitesh
    Pritchard, Megan
    Stewart, Robert
    Fusar-Poli, Paolo
    McGuire, Philip
    SCHIZOPHRENIA BULLETIN, 2020, 46 : S272 - S273
  • [27] Bridging the Gap between Semantics and Multimedia Processing
    Moreno, Marcio Ferreira
    Lima, Guilherme
    Santos, Rodrigo
    Azevedo, Roberto
    Endler, Markus
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 315 - 318
  • [28] Geoscience and natural hazards policy: Bridging the gap
    Applegate, D
    GEOTIMES, 1996, 41 (11): : 15 - 15
  • [29] Natural language processing
    Chowdhury, GG
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 : 51 - 89
  • [30] Natural language processing
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) : 352 - 357