Arabic Fake News Detection in Social Media Context Using Word Embeddings and Pre-trained Transformers

被引:2
|
作者
Azzeh, Mohammad [1 ]
Qusef, Abdallah [1 ]
Alabboushi, Omar [1 ]
机构
[1] Princess Sumaya Univ Technol, King Hussain Sch Comp Sci, Amman, Jordan
关键词
Arabic fake news detection; Natural language processing; BERT; CAMeLBERT; AraBERT; ARBERT; MARBERT; AraELECTRA;
D O I
10.1007/s13369-024-08959-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The quick spread of fake news in different languages on social platforms has become a global scourge threatening societal security and the government. Fake news is usually written to deceive readers and convince them that this false information is correct; therefore, stopping the spread of this false information becomes a priority of governments and societies. Building fake news detection models for the Arabic language comes with its own set of challenges and limitations. Some of the main limitations include 1) lack of annotated data, 2) dialectal variations where each dialect can vary significantly in terms of vocabulary, grammar, and syntax, 3) morphological complexity with complex word formations and root-and-pattern morphology, 4) semantic ambiguity that make models fail to accurately discern the intent and context of a given piece of information, 5) cultural context and 6) diacrasy. The objective of this paper is twofold: first, we design a large corpus of annotated fake new data for the Arabic language from multiple sources. The corpus is collected from multiple sources to include different dialects and cultures. Second, we build fake detection by building machine learning models as model head over the fine-tuned large language models. These large language models were trained on Arabic language, such as ARBERT, AraBERT, CAMeLBERT, and the popular word embedding technique AraVec. The results showed that the text representations produced by the CAMeLBERT transformer are the most accurate because all models have outstanding evaluation results. We found that using the built deep learning classifiers with the transformer is generally better than classical machine learning classifiers. Finally, we could not find a stable conclusion concerning which model works well with each text representation method because each evaluation measure has a different favored model.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Online Fake News Detection using Pre-trained Embeddings
    Reshi, Junaid Ali
    Ali, Rashid
    [J]. 2022 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES (IMPACT), 2022,
  • [2] Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media
    Albalawi, Yahya
    Buckley, Jim
    Nikolov, Nikola S.
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [3] Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media
    Yahya Albalawi
    Jim Buckley
    Nikola S. Nikolov
    [J]. Journal of Big Data, 8
  • [4] A Comparative Study of Pre-trained Word Embeddings for Arabic Sentiment Analysis
    Zouidine, Mohamed
    Khalil, Mohammed
    [J]. 2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1243 - 1248
  • [5] Experiments in News Bias Detection with Pre-trained Neural Transformers
    Menzner, Tim
    Leidner, Jochen L.
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT IV, 2024, 14611 : 270 - 284
  • [6] The impact of using pre-trained word embeddings in Sinhala chatbots
    Gamage, Bimsara
    Pushpananda, Randil
    Weerasinghe, Ruvan
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 161 - 165
  • [7] Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings
    Jaber, Areej
    Martinez, Paloma
    [J]. HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 501 - 508
  • [8] Automated Employee Objective Matching Using Pre-trained Word Embeddings
    Ghanem, Mohab
    Elnaggar, Ahmed
    Mckinnon, Adam
    Debes, Christian
    Boisard, Olivier
    Matthes, Florian
    [J]. 2021 IEEE 25TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2021), 2021, : 51 - 60
  • [9] Smart Edge-based Fake News Detection using Pre-trained BERT Model
    Guo, Yuhang
    Lamaazi, Hanane
    Mizouni, Rabeb
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), 2022,
  • [10] Effects of Pre-trained Word Embeddings on Text-based Deception Detection
    Nam, David
    Yasmin, Jerin
    Zulkernine, Farhana
    [J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 437 - 443