Big Data and quality data for fake news and misinformation detection

被引:63
|
作者
Asr, Fatemeh Torabi [1 ]
Taboada, Maite [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC, Canada
来源
BIG DATA & SOCIETY | 2019年 / 6卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
Fake news; misinformation; labelled datasets; text classification; machine learning; topic modelling; INFORMATION;
D O I
10.1177/2053951719843310
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Fake news has become an important topic of research in a variety of disciplines including linguistics and computer science. In this paper, we explain how the problem is approached from the perspective of natural language processing, with the goal of building a system to automatically detect misinformation in news. The main challenge in this line of research is collecting quality data, i.e., instances of fake and real news articles on a balanced distribution of topics. We review available datasets and introduce the MisInfoText repository as a contribution of our lab to the community. We make available the full text of the news articles, together with veracity labels previously assigned based on manual assessment of the articles' truth content. We also perform a topic modelling experiment to elaborate on the gaps and sources of imbalance in currently available datasets to guide future efforts. We appeal to the community to collect more data and to make it available for research purposes.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Big data, algorithmization and new media against misinformation and fake news. Bots to minimize the impact on organizations
    Flores Vivar, Jesus Miguel
    [J]. COMUNICACION Y HOMBRE, 2020, (16): : 101 - 114
  • [2] Fake News Detection Enhancement with Data Imputation
    Kotteti, Chandra Mouli Madhav
    Dong, Xishuang
    Li, Na
    Qian, Lijun
    [J]. 2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 187 - 192
  • [3] Big Data ML-Based Fake News Detection Using Distributed Learning
    Altheneyan, Alaa
    Alhadlaq, Aseel
    [J]. IEEE ACCESS, 2023, 11 : 29447 - 29463
  • [4] Fake News Detection from Data Streams
    Ksieniewicz, Pawel
    Zyblewski, Pawel
    Choras, Michal
    Kozik, Rafal
    Gielczyk, Agata
    Wozniak, Michal
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] The Impact of Misinformation and Fake News on the Quality of Academic Research
    Akeriwe, Miriam Linda
    Ayoung, Daniel Azerikatoa
    Abagre, Francis
    Bekoe, Stephen
    [J]. QUALITATIVE & QUANTITATIVE METHODS IN LIBRARIES, 2023, 12 (03): : 455 - 470
  • [6] Fake news, big data, and the opportunities and threats of targeted actions
    Redekop, W. Ken
    [J]. HEALTH POLICY AND TECHNOLOGY, 2018, 7 (02) : 113 - 114
  • [7] Multimodal Data Fusion Framework For Fake News Detection
    Athira, A. B.
    Tiwari, Abhishek
    Kumar, S. D. Madhu
    Chacko, Anu Mary
    [J]. 2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [8] A Scoping Review of the Relationship of Big Data Analytics with Context-Based Fake News Detection on Digital Media in Data Age
    Shahzad, Khurram
    Khan, Shakeel Ahmad
    Ahmad, Shakil
    Iqbal, Abid
    [J]. SUSTAINABILITY, 2022, 14 (21)
  • [9] IS FAKE DATA GOOD NEWS?
    Woollaston, Victoria
    [J]. Engineering and Technology, 2021, 16 (08):
  • [10] Real-Time Fake News Detection Using Big Data Analytics and Deep Neural Network
    Babar, Muhammad
    Ahmad, Awais
    Tariq, Muhammad Usman
    Kaleem, Sarah
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (04): : 5189 - 5198