Exploiting Textual Information for Fake News Detection

被引：2

作者：

Kasseropoulos, Dimitrios Panagiotis ^{[1
]}

Koukaras, Paraskevas ^{[1
]}

Tjortjis, Christos ^{[1
]}

机构：

[1] Hellen Univ, Sch Sci & Technol Int, Data Min & Analyt Res Grp, 14th Km Thessaloniki N Moudania, Thessaloniki 57001, Greece

来源：

INTERNATIONAL JOURNAL OF NEURAL SYSTEMS | 2022年 / 32卷 / 12期

关键词：

Fake news; Machine Learning (ML); Artificial Neural Networks (ANN); Natural Language Processing (NLP); Association Rules Mining (ARM); SOCIAL MEDIA; CLASSIFICATION; SELECTION;

D O I：

10.1142/S0129065722500587

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

"Fake news" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.

引用

页数：18

共 50 条

[21] Multimodal Fake News Detection
Segura-Bedmar, Isabel
Alonso-Bartolome, Santiago
INFORMATION, 2022, 13 (06)
[22] Enhancing Information Integrity: Machine Learning Methods for Fake News Detection
Sahu, Shruti
Bansal, Poonam
Kumari, Ritika
FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 1, CIS 2023, 2024, 868 : 247 - 257
[23] A Novel Approach Towards Fake News Detection: Deep Learning Augmented with Textual Entailment Features
Saikh, Tanik
Anand, Amit
Ekbal, Asif
Bhattacharyya, Pushpak
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019), 2019, 11608 : 345 - 358
[24] Albanian Fake News Detection
Canhasi, Ercan
Shijaku, Rexhep
Berisha, Erblin
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
[25] SceneFND: Multimodal fake news detection by modelling scene context information
Zhang, Guobiao
Giachanou, Anastasia
Rosso, Paolo
JOURNAL OF INFORMATION SCIENCE, 2024, 50 (02) : 355 - 367
[26] Leveraging Supplementary Information for Multi-Modal Fake News Detection
Ho, Chia-Chun
Dai, Bi-Ru
2023 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES FOR DISASTER MANAGEMENT, ICT-DM, 2023, : 50 - 54
[27] A Tool for Fake News Detection
Al Asaad, Bashar
Erascu, Madalina
2018 20TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2018), 2019, : 379 - 386
[28] Fake news detection on Twitter
Sharma, Srishti
Saraswat, Mala
Dubey, Anil Kumar
INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2022, 18 (5/6) : 388 - 412
[29] Exploiting Textual Source Information for Epidemiosurveillance
Arsevska, Elena, 1600, Springer Verlag (478):
[30] Exploiting Textual Source Information for Epidemiosurveillance
Arsevska, Elena
Roche, Mathieu
Lancelot, Renaud
Hendrikx, Pascal
Dufour, Barbara
METADATA AND SEMANTICS RESEARCH, MTSR 2014, 2014, 478 : 359 - 361

← 1 2 3 4 5 →