An empiric validation of linguistic features in machine learning models for fake news detection

被引:2
|
作者
Puraivan, Eduardo [1 ,2 ]
Venegas, Rene [3 ]
Riquelme, Fabian [2 ]
机构
[1] Univ Vina del Mar, Escuela Ciencias, Vina del Mar, Chile
[2] Univ Valparaiso, Escuela Ingn Informat, Valparaiso, Chile
[3] Pontificia Univ Catoica Valparaiso, Inst Literatura & Ciencias Lenguaje, Valparaiso, Chile
关键词
Fake news; Mass media; Natural language processing; Linguistic features; Machine learning;
D O I
10.1016/j.datak.2023.102207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diffusion of fake news is a growing problem with a high and negative social impact. There are several approaches to address the detection of fake news. This work focuses on a hybrid approach based on functional linguistic features and machine learning. There are several recent works with this approach. However, there are no clear guidelines on which linguistic features are most appropriate nor how to justify their use. Furthermore, many classification results are modest compared to recent advances in natural language processing. Our proposal considers 88 features organized in surface information, part of speech, discursive characteristics, and read-ability indices. On a 42 677 news database, we show that the classification results outperform previous work, even outperforming state-of-the-art techniques such as BERT, reaching 99.99% accuracy. A proper selection of linguistic features is crucial for interpretability as well as the performance of the models. In this sense, our proposal contributes to the intentional selection of linguistic features, overcoming current technical issues. We identified 32 features that show differences between the type of news. The results are highly competitive in the classification and simple to implement and interpret.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Machine Learning Models for Fake News Detection: A Review
    Gowthami, Dasari
    Gupta, Ananya
    Sharma, Monika
    Kumar, Tapas
    Mongia, Shweta
    Singh, Niharika
    Proceedings of the 2022 11th International Conference on System Modeling and Advancement in Research Trends, SMART 2022, 2022, : 947 - 951
  • [2] Comparison of Various Machine Learning Models for Accurate Detection of Fake News
    Poddar, Karishnu
    Amali, Geraldine Bessie D.
    Umadevi, K. S.
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [3] A benchmark study of machine learning models for online fake news detection
    Khan, Junaed Younus
    Khondaker, Md. Tawkat Islam
    Afroz, Sadia
    Uddin, Gias
    Iqbal, Anindya
    MACHINE LEARNING WITH APPLICATIONS, 2021, 4
  • [4] A Machine Learning approach for Fake News Detection
    Bisen, Wani H.
    Paunikar, Anuragini
    Thakur, Bharat
    Garg, Anushka
    Nangliya, Khushbu
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (05): : 1050 - 1056
  • [5] Explainable Machine Learning for Fake News Detection
    Reis, Julio C. S.
    Correia, Andre
    Murai, Fabricio
    Veloso, Adriano
    Benevenuto, Fabricio
    PROCEEDINGS OF THE 11TH ACM CONFERENCE ON WEB SCIENCE (WEBSCI'19), 2019, : 17 - 26
  • [6] Linguistic features based framework for automatic fake news detection
    Garg, Sonal
    Sharma, Dilip Kumar
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 172
  • [7] Evaluating Fake News Detection Models from Explainable Machine Learning Perspectives
    Alharbi, Raed
    Vu, Minh N.
    Thai, My T.
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [8] Fake news detection on Pakistani news using machine learning and deep learning
    Kishwar, Azka
    Zafar, Adeel
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [9] WELFake: Word Embedding Over Linguistic Features for Fake News Detection
    Verma, Pawan Kumar
    Agrawal, Prateek
    Amorim, Ivone
    Prodan, Radu
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2021, 8 (04) : 881 - 893
  • [10] Which machine learning paradigm for fake news detection?
    Katsaros, Dimitrios
    Stavropoulos, George
    Papakostas, Dimitrios
    2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, : 383 - 387