Exploiting Textual Information for Fake News Detection

被引:2
|
作者
Kasseropoulos, Dimitrios Panagiotis [1 ]
Koukaras, Paraskevas [1 ]
Tjortjis, Christos [1 ]
机构
[1] Hellen Univ, Sch Sci & Technol Int, Data Min & Analyt Res Grp, 14th Km Thessaloniki N Moudania, Thessaloniki 57001, Greece
关键词
Fake news; Machine Learning (ML); Artificial Neural Networks (ANN); Natural Language Processing (NLP); Association Rules Mining (ARM); SOCIAL MEDIA; CLASSIFICATION; SELECTION;
D O I
10.1142/S0129065722500587
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
"Fake news" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Multimodal Fake News Detection
    Segura-Bedmar, Isabel
    Alonso-Bartolome, Santiago
    INFORMATION, 2022, 13 (06)
  • [22] Enhancing Information Integrity: Machine Learning Methods for Fake News Detection
    Sahu, Shruti
    Bansal, Poonam
    Kumari, Ritika
    FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 1, CIS 2023, 2024, 868 : 247 - 257
  • [23] A Novel Approach Towards Fake News Detection: Deep Learning Augmented with Textual Entailment Features
    Saikh, Tanik
    Anand, Amit
    Ekbal, Asif
    Bhattacharyya, Pushpak
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019), 2019, 11608 : 345 - 358
  • [24] Albanian Fake News Detection
    Canhasi, Ercan
    Shijaku, Rexhep
    Berisha, Erblin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [25] SceneFND: Multimodal fake news detection by modelling scene context information
    Zhang, Guobiao
    Giachanou, Anastasia
    Rosso, Paolo
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (02) : 355 - 367
  • [26] Leveraging Supplementary Information for Multi-Modal Fake News Detection
    Ho, Chia-Chun
    Dai, Bi-Ru
    2023 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES FOR DISASTER MANAGEMENT, ICT-DM, 2023, : 50 - 54
  • [27] A Tool for Fake News Detection
    Al Asaad, Bashar
    Erascu, Madalina
    2018 20TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2018), 2019, : 379 - 386
  • [28] Fake news detection on Twitter
    Sharma, Srishti
    Saraswat, Mala
    Dubey, Anil Kumar
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2022, 18 (5/6) : 388 - 412
  • [29] Exploiting Textual Source Information for Epidemiosurveillance
    Arsevska, Elena, 1600, Springer Verlag (478):
  • [30] Exploiting Textual Source Information for Epidemiosurveillance
    Arsevska, Elena
    Roche, Mathieu
    Lancelot, Renaud
    Hendrikx, Pascal
    Dufour, Barbara
    METADATA AND SEMANTICS RESEARCH, MTSR 2014, 2014, 478 : 359 - 361