WELFake: Word Embedding Over Linguistic Features for Fake News Detection

被引:104
|
作者
Verma, Pawan Kumar [1 ,2 ]
Agrawal, Prateek [2 ,3 ]
Amorim, Ivone [4 ,5 ]
Prodan, Radu [3 ]
机构
[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India
[2] Lovely Profess Univ, Sch Comp Sci & Engn, Phagwara 144411, India
[3] Univ Klagenfurt, Inst Informat Technol, A-9020 Klagenfurt, Austria
[4] MOG Technol, P-4470605 Moreira, Portugal
[5] Univ Porto, CMUP Math Res Ctr, P-4099002 Porto, Portugal
基金
欧盟地平线“2020”;
关键词
Social networking (online); Linguistics; Data models; Bit error rate; Feature extraction; Training; Vegetation; Bidirectional encoder representations from transformer (BERT); convolutional neural network (CNN); fake news; linguistic feature; machine learning (ML); text classification; voting classifier; word embedding (WE); DECEPTION; CUES;
D O I
10.1109/TCSS.2021.3068519
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Social media is a popular medium for the dissemination of real-time news all over the world. Easy and quick information proliferation is one of the reasons for its popularity. An extensive number of users with different age groups, gender, and societal beliefs are engaged in social media websites. Despite these favorable aspects, a significant disadvantage comes in the form of fake news, as people usually read and share information without caring about its genuineness. Therefore, it is imperative to research methods for the authentication of news. To address this issue, this article proposes a two-phase benchmark model named WELFake based on word embedding (WE) over linguistic features for fake news detection using machine learning classification. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE and applies voting classification. To validate its approach, this article also carefully designs a novel WELFake data set with approximately 72 000 articles, which incorporates different data sets to generate an unbiased classification output. Experimental results show that the WELFake model categorizes the news in real and fake with a 96.73% which improves the overall accuracy by 1.31% compared to bidirectional encoder representations from transformer (BERT) and 4.25% compared to convolutional neural network (CNN) models. Our frequency-based and focused analyzing writing patterns model outperforms predictive-based related works implemented using the Word2vec WE method by up to 1.73%.
引用
收藏
页码:881 / 893
页数:13
相关论文
共 50 条
  • [21] Novel approach for predicting fake news stance detection using large word embedding blending and customized CNN model
    Altamimi, Abdulaziz
    PLOS ONE, 2024, 19 (12):
  • [22] It's All in the Embedding! Fake News Detection Using Document Embeddings
    Truica, Ciprian-Octavian
    Apostol, Elena-Simona
    MATHEMATICS, 2023, 11 (03)
  • [23] Arabic fake news detection based on deep contextualized embedding models
    Nassif, Ali Bou
    Elnagar, Ashraf
    Elgendy, Omar
    Afadar, Yaman
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (18): : 16019 - 16032
  • [24] Robust Fake News Detection Over Time and Attack
    Horne, Benjamin D.
    Norregaard, Jeppe
    Adali, Sibel
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (01)
  • [25] Linguistic feature based learning model for fake news detection and classification
    Choudhary, Anshika
    Arora, Anuja
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
  • [26] An exploration of features to improve the generalisability of fake news detection models
    Hoy, Nathaniel
    Koulouri, Theodora
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [27] Fake news stance detection using selective features and FakeNET
    Aljrees, Turki
    Cheng, Xiaochun
    Ahmed, Mian Muhammad
    Umer, Muhammad
    Majeed, Rizwan
    Alnowaiser, Khaled
    Abuzinadah, Nihal
    Ashraf, Imran
    PLOS ONE, 2023, 18 (07):
  • [28] Creating Task-Generic Features for Fake News Detection
    Olivieri, Alex C.
    Shabani, Shahan
    Sokhn, Maria
    Cudre-Mauroux, Philippe
    PROCEEDINGS OF THE 52ND ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2019, : 5196 - 5205
  • [29] Potential Features Fusion Network for Multimodal Fake News Detection
    Kou, Feifei
    Wang, Bingwei
    Li, Haisheng
    Zhu, Chuangying
    Shi, Lei
    Zhang, Jiwei
    Qi, Limei
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)
  • [30] Twitter Truth: Advanced Multi-Model Embedding for Fake News Detection
    Lahlou, Yasmine
    El Fkihi, Sanaa
    Faizi, Rdouan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 551 - 560