An empiric validation of linguistic features in machine learning models for fake news detection

被引:2
|
作者
Puraivan, Eduardo [1 ,2 ]
Venegas, Rene [3 ]
Riquelme, Fabian [2 ]
机构
[1] Univ Vina del Mar, Escuela Ciencias, Vina del Mar, Chile
[2] Univ Valparaiso, Escuela Ingn Informat, Valparaiso, Chile
[3] Pontificia Univ Catoica Valparaiso, Inst Literatura & Ciencias Lenguaje, Valparaiso, Chile
关键词
Fake news; Mass media; Natural language processing; Linguistic features; Machine learning;
D O I
10.1016/j.datak.2023.102207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diffusion of fake news is a growing problem with a high and negative social impact. There are several approaches to address the detection of fake news. This work focuses on a hybrid approach based on functional linguistic features and machine learning. There are several recent works with this approach. However, there are no clear guidelines on which linguistic features are most appropriate nor how to justify their use. Furthermore, many classification results are modest compared to recent advances in natural language processing. Our proposal considers 88 features organized in surface information, part of speech, discursive characteristics, and read-ability indices. On a 42 677 news database, we show that the classification results outperform previous work, even outperforming state-of-the-art techniques such as BERT, reaching 99.99% accuracy. A proper selection of linguistic features is crucial for interpretability as well as the performance of the models. In this sense, our proposal contributes to the intentional selection of linguistic features, overcoming current technical issues. We identified 32 features that show differences between the type of news. The results are highly competitive in the classification and simple to implement and interpret.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Machine Learning-Based Approach for Fake News Detection
    Gururaj H.L.
    Lakshmi H.
    Soundarya B.C.
    Flammini F.
    Janhavi V.
    Journal of ICT Standardization, 2022, 10 (04): : 509 - 530
  • [22] Fighting the Fake: A Forensic Linguistic Analysis to Fake News Detection
    Rui Sousa-Silva
    International Journal for the Semiotics of Law - Revue internationale de Sémiotique juridique, 2022, 35 : 2409 - 2433
  • [23] Fake news detection using supervised machine learning techniques
    Malhotra, Pooja
    Malik, Sanjay Kumar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2022, 43 (01): : 7 - 15
  • [24] Fake News Detection Using Pos Tagging and Machine Learning
    Kansal, Afreen
    JOURNAL OF APPLIED SECURITY RESEARCH, 2023, 18 (02) : 164 - 179
  • [25] Detection of Turkish Fake News in Twitter with Machine Learning Algorithms
    Suleyman Gokhan Taskin
    Ecir Ugur Kucuksille
    Kamil Topal
    Arabian Journal for Science and Engineering, 2022, 47 : 2359 - 2379
  • [26] A comprehensive survey on machine learning approaches for fake news detection
    Jawaher Alghamdi
    Suhuai Luo
    Yuqing Lin
    Multimedia Tools and Applications, 2024, 83 : 51009 - 51067
  • [27] A Research on Fake News Detection Using Machine Learning Algorithm
    Shrivastava, Sagar
    Singh, Rishika
    Jain, Charu
    Kaushal, Shivangi
    SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021), 2022, 235 : 273 - 287
  • [28] Fake News Detection Using Machine Learning Ensemble Methods
    Ahmad, Iftikhar
    Yousaf, Muhammad
    Yousaf, Suhail
    Ahmad, Muhammad Ovais
    COMPLEXITY, 2020, 2020
  • [29] Integrating Machine Learning Techniques in Semantic Fake News Detection
    Adrian M. P. Braşoveanu
    Răzvan Andonie
    Neural Processing Letters, 2021, 53 : 3055 - 3072
  • [30] A Machine Learning Technique for Detection of Social Media Fake News
    Arowolo, Micheal Olaolu
    Misra, Sanjay
    Ogundokun, Roseline Oluwaseun
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2023, 19 (01)