The PolitiFact-Oslo Corpus: A New Dataset for Fake News Analysis and Detection

被引:4
|
作者
Poldvere, Nele [1 ]
Uddin, Zia [2 ]
Thomas, Aleena [2 ]
机构
[1] Univ Oslo, Dept Literature Area Studies & European Languages, N-0315 Oslo, Norway
[2] Sintef Digital, N-0373 Oslo, Norway
关键词
corpus development; text type; sentiment; part-of-speech; Bi-LSTM; transformers;
D O I
10.3390/info14120627
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study presents a new dataset for fake news analysis and detection, namely, the PolitiFact-Oslo Corpus. The corpus contains samples of both fake and real news in English, collected from the fact-checking website PolitiFact.com. It grew out of a need for a more controlled and effective dataset for fake news analysis and detection model development based on recent events. Three features make it uniquely placed for this: (i) the texts have been individually labelled for veracity by experts, (ii) they are complete texts that strictly correspond to the claims in question, and (iii) they are accompanied by important metadata such as text type (e.g., social media, news and blog). In relation to this, we present a pipeline for collecting quality data from major fact-checking websites, a procedure which can be replicated in future corpus building efforts. An exploratory analysis based on sentiment and part-of-speech information reveals interesting differences between fake and real news as well as between text types, thus highlighting the importance of adding contextual information to fake news corpora. Since the main application of the PolitiFact-Oslo Corpus is in automatic fake news detection, we critically examine the applicability of the corpus and another PolitiFact dataset built based on less strict criteria for various deep learning-based efficient approaches, such as Bidirectional Long Short-Term Memory (Bi-LSTM), LSTM fine-tuned transformers such as Bidirectional Encoder Representations from Transformers (BERT) and RoBERTa, and XLNet.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] Detection and Analysis of Fake News Users' Communities in Social Media
    Amira, Abdelouahab
    Derhab, Abdelouahid
    Hadjar, Samir
    Merazka, Mustapha
    Alam, Md. Golam Rabiul
    Hassan, Mohammad Mehedi
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (04): : 5050 - 5059
  • [42] Fake news detection using discourse segment structure analysis
    Uppal, Anmol
    Sachdeva, Vipul
    Sharma, Seema
    PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 751 - 756
  • [43] Sentiment Analysis for Fake News Detection by Means of Neural Networks
    Kula, Sebastian
    Choras, Michal
    Kozik, Rafal
    Ksieniewicz, Pawel
    Wozniak, Michal
    COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 653 - 666
  • [44] Analysis of fake news detection using machine learning technique
    Seetharaman, R.
    Tharun, M.
    Mole, S. S. Sreeja
    Anandan, K.
    MATERIALS TODAY-PROCEEDINGS, 2022, 51 : 2218 - 2223
  • [45] A Comparative Analysis of Graph Neural Networks for Fake News Detection
    Harby, Ahmed A.
    Zutkernine, Farhana
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 1215 - 1222
  • [46] Beyond News Contents: The Role of Social Context for Fake New Detection
    Shu, Kai
    Wang, Suhang
    Liu, Huan
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 312 - 320
  • [47] Fake news detection models using the largest social media ground-truth dataset (TruthSeeker)
    Khalil M.
    Azzeh M.
    International Journal of Speech Technology, 2024, 27 (02) : 389 - 404
  • [48] Combating Fake News with Transformers: A Comparative Analysis of Stance Detection and Subjectivity Analysis
    Kasnesis, Panagiotis
    Toumanidis, Lazaros
    Patrikakis, Charalampos Z.
    INFORMATION, 2021, 12 (10)
  • [49] Enhanced Detection of Misinformation Text-based Fake News Analysis
    Divya, J.
    Ragul, M.
    Srinivas, S. Rupesh
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, : 691 - 696
  • [50] Fake News Detection Based on Knowledge-Guided Semantic Analysis
    Zhao, Wenbin
    He, Peisong
    Zeng, Zhixin
    Xu, Xiong
    ELECTRONICS, 2024, 13 (02)