Big Data and quality data for fake news and misinformation detection

被引:63
|
作者
Asr, Fatemeh Torabi [1 ]
Taboada, Maite [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC, Canada
来源
BIG DATA & SOCIETY | 2019年 / 6卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
Fake news; misinformation; labelled datasets; text classification; machine learning; topic modelling; INFORMATION;
D O I
10.1177/2053951719843310
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Fake news has become an important topic of research in a variety of disciplines including linguistics and computer science. In this paper, we explain how the problem is approached from the perspective of natural language processing, with the goal of building a system to automatically detect misinformation in news. The main challenge in this line of research is collecting quality data, i.e., instances of fake and real news articles on a balanced distribution of topics. We review available datasets and introduce the MisInfoText repository as a contribution of our lab to the community. We make available the full text of the news articles, together with veracity labels previously assigned based on manual assessment of the articles' truth content. We also perform a topic modelling experiment to elaborate on the gaps and sources of imbalance in currently available datasets to guide future efforts. We appeal to the community to collect more data and to make it available for research purposes.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Fake News Detection in Social Networks Using Data Mining Techniques
    Alquran, Hebah
    Banitaan, Shadi
    [J]. 2022 IEEE WORLD AI IOT CONGRESS (AIIOT), 2022, : 155 - 160
  • [22] Text Data Augmentation Techniques for Fake News Detection in the Romanian Language
    Bucos, Marian
    Tucudean, Georgiana
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [23] Improving Data Fusion for Fake News Detection: A Hybrid Fusion Approach for Unimodal and Multimodal Data
    Hamed, Suhaib Kh.
    Ab Aziz, Mohd Juzaiddin
    Yaakub, Mohd Ridzwan
    [J]. IEEE ACCESS, 2024, 12 : 112412 - 112425
  • [24] Fake Account Detection in Social Media Using Big Data Analytics
    Mujeeb, Shaik
    Gupta, Sangeeta
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND COMMUNICATION SYSTEMS, ICACECS 2021, 2022, : 587 - 596
  • [25] Fake profile detection in multimedia big data on online social networks
    Sahoo S.R.
    Gupta B.B.
    [J]. International Journal of Information and Computer Security, 2020, 12 (2-3): : 303 - 331
  • [26] The power of big data analytics over fake news: A scientometric review of Twitter as a in healthcare
    Cano-Marin, Enrique
    Mora-Cantallops, Marcal
    Sanchez-Alonso, Salvador
    [J]. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2023, 190
  • [27] Fake news, big data and risks for democracy: new challenges for the information and media competence
    Capinzaiki Ottonicar, Selma Leticia
    Pomim Valentim, Marta Ligia
    Jorge, Leandro Feitosa
    Mosconi, Laine
    [J]. IBERSID-REVISTA DE SISTEMAS DE INFORMACION Y DOCUMENTACION, 2021, 15 (01): : 63 - 74
  • [28] Perspective of anomaly detection in big data for data quality improvement
    Keskar, Vinaya
    Yadav, Jyoti
    Kumar, Ajay
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 51 : 532 - 537
  • [29] Framing Fake News: Misinformation and the ACRL Framework
    Faix, Allison
    Fyn, Amy
    [J]. PORTAL-LIBRARIES AND THE ACADEMY, 2020, 20 (03) : 495 - 508
  • [30] #Fake News: Scientific Research in the Age of Misinformation
    Gilligan, Jeffrey T.
    Gologorsky, Yakov
    [J]. WORLD NEUROSURGERY, 2019, 131 : 284 - 284