A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

被引:14
|
作者
De, Arkadipta [1 ]
Bandyopadhyay, Dibyanayan [2 ]
Gain, Baban [2 ]
Ekbal, Asif [2 ]
机构
[1] Indian Inst Technol Hyderabad, Hyderabad, India
[2] Indian Inst Technol Patna, Patna, Bihar, India
关键词
Fake news detection; low-resource languages; multilingual; Hindi; Swahili; Indonesian; Vietnamese;
D O I
10.1145/3472619
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Combating Fake News in "Low-Resource" Languages: Amharic Fake News Detection Accompanied by Resource Crafting
    Gereme, Fantahun
    Zhu, William
    Ayall, Tewodros
    Alemu, Dagmawi
    [J]. INFORMATION, 2021, 12 (01) : 1 - 9
  • [2] Fake news detection in low-resource languages: A novel hybrid summarization
    Alghamdi, Jawaher
    Lin, Yuqing
    Luo, Suhuai
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [3] Fake news detection based on news content and social contexts: a transformer-based approach
    Shaina Raza
    Chen Ding
    [J]. International Journal of Data Science and Analytics, 2022, 13 : 335 - 362
  • [4] Fake news detection based on news content and social contexts: a transformer-based approach
    Raza, Shaina
    Ding, Chen
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 13 (04) : 335 - 362
  • [5] Automatic Fake News Detection in Political Platforms - A Transformer-based Approach
    Raza, Shaina
    [J]. CASE 2021: THE 4TH WORKSHOP ON CHALLENGES AND APPLICATIONS OF AUTOMATED EXTRACTION OF SOCIO-POLITICAL EVENTS FROM TEXT (CASE), 2021, : 68 - 78
  • [6] Introduction to Special Issue on Misinformation, Fake News and Rumor Detection in Low-Resource Languages
    Kumar, Akshi
    Esposito, Christian
    Karras, Dimitrios A.
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [7] Comparing Transformer-Based Machine Translation Models for Low-Resource Languages of Colombia and Mexico
    Angel, Jason
    Manuel Meque, Abdul Gafar
    Maldonado-Sifuentes, Christian
    Sidorov, Grigori
    Gelbukh, Alexander
    [J]. ADVANCES IN SOFT COMPUTING, MICAI 2023, PT II, 2024, 14392 : 95 - 105
  • [8] A Hybrid Transformer-Based Model for Optimizing Fake News Detection
    Al-Quayed, Fatima
    Javed, Danish
    Jhanjhi, N.Z.
    Humayun, Mamoona
    Alnusairi, Thanaa S.
    [J]. IEEE Access, 2024, 12 : 160822 - 160834
  • [9] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656
  • [10] A transformer-based architecture for fake news classification
    Divyam Mehta
    Aniket Dwivedi
    Arunabha Patra
    M. Anand Kumar
    [J]. Social Network Analysis and Mining, 2021, 11