WELFake: Word Embedding Over Linguistic Features for Fake News Detection

被引:104
|
作者
Verma, Pawan Kumar [1 ,2 ]
Agrawal, Prateek [2 ,3 ]
Amorim, Ivone [4 ,5 ]
Prodan, Radu [3 ]
机构
[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India
[2] Lovely Profess Univ, Sch Comp Sci & Engn, Phagwara 144411, India
[3] Univ Klagenfurt, Inst Informat Technol, A-9020 Klagenfurt, Austria
[4] MOG Technol, P-4470605 Moreira, Portugal
[5] Univ Porto, CMUP Math Res Ctr, P-4099002 Porto, Portugal
基金
欧盟地平线“2020”;
关键词
Social networking (online); Linguistics; Data models; Bit error rate; Feature extraction; Training; Vegetation; Bidirectional encoder representations from transformer (BERT); convolutional neural network (CNN); fake news; linguistic feature; machine learning (ML); text classification; voting classifier; word embedding (WE); DECEPTION; CUES;
D O I
10.1109/TCSS.2021.3068519
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Social media is a popular medium for the dissemination of real-time news all over the world. Easy and quick information proliferation is one of the reasons for its popularity. An extensive number of users with different age groups, gender, and societal beliefs are engaged in social media websites. Despite these favorable aspects, a significant disadvantage comes in the form of fake news, as people usually read and share information without caring about its genuineness. Therefore, it is imperative to research methods for the authentication of news. To address this issue, this article proposes a two-phase benchmark model named WELFake based on word embedding (WE) over linguistic features for fake news detection using machine learning classification. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE and applies voting classification. To validate its approach, this article also carefully designs a novel WELFake data set with approximately 72 000 articles, which incorporates different data sets to generate an unbiased classification output. Experimental results show that the WELFake model categorizes the news in real and fake with a 96.73% which improves the overall accuracy by 1.31% compared to bidirectional encoder representations from transformer (BERT) and 4.25% compared to convolutional neural network (CNN) models. Our frequency-based and focused analyzing writing patterns model outperforms predictive-based related works implemented using the Word2vec WE method by up to 1.73%.
引用
收藏
页码:881 / 893
页数:13
相关论文
共 50 条
  • [31] AI and Fake News: A Conceptual Framework for Fake News Detection
    Ameli, Leila
    Chowdhury, Md Shah Alam
    Farid, Farnaz
    Bello, Abubakar
    Sabrina, Fariza
    Maurushat, Alana
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON CYBER SECURITY, CSW 2022, 2022, : 34 - 39
  • [32] Persian Fake News Detection: Neural Representation and Classification at Word and Text Levels
    Samadi, Mohammadreza
    Mousavian, Maryam
    Momtazi, Saeedeh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [33] Fake News Detection Using Time Series and User Features Classification
    Previti, Marialaura
    Rodriguez-Fernandez, Victor
    Camacho, David
    Carchiolo, Vincenza
    Malgeri, Michele
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2020, 2020, 12104 : 339 - 353
  • [34] A Multi-Kernel Optimized Convolutional Neural Network With Urdu Word Embedding to Detect Fake News
    Zaheer, Khurram
    Talib, Muhammad Ramzan
    Hanif, Muhammad Kashif
    Sarwar, Muhammad Umer
    IEEE ACCESS, 2023, 11 : 142371 - 142382
  • [35] Evaluating the effectiveness of publishers' features in fake news detection on social media
    Jarrahi, Ali
    Safari, Leila
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2913 - 2939
  • [36] Evaluating the effectiveness of publishers’ features in fake news detection on social media
    Ali Jarrahi
    Leila Safari
    Multimedia Tools and Applications, 2023, 82 : 2913 - 2939
  • [37] Fake News Detection Model with Hybrid Features-News Text, Image, and Social Context
    Lin, Szu-Yin
    Hu, Ya-Han
    Lee, Pei-Ju
    Zeng, Yi-Hua
    Chang, Chi-Min
    Chang, Hsiao-Chuan
    INFORMATION SYSTEMS FRONTIERS, 2025,
  • [38] Computing the Linguistic-Based Cues of Fake News in the Philippines Towards its Detection
    Fernandez, Aaron Carl T.
    Devaraj, Madhavi
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS (WIMS 2019), 2019,
  • [39] Multimodal Fake News Detection
    Segura-Bedmar, Isabel
    Alonso-Bartolome, Santiago
    INFORMATION, 2022, 13 (06)
  • [40] Mining Common Quantitative Features and Cross-Linguistic Clustering of English and Russian Fake News
    Wei, Yuan
    Haitao, Liu
    Data Analysis and Knowledge Discovery, 2024, 8 (07) : 1 - 13