MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

被引:29
|
作者
Nielsen, Dan S. [1 ]
McConville, Ryan [1 ]
机构
[1] Univ Bristol, Dept Engn Math, Bristol, Avon, England
关键词
dataset; misinformation; graph; twitter; social network; fake news; NEWS;
D O I
10.1145/3477495.3531744
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system ( MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin- dataset.github.io/, including the data, documentation, tutorials and leaderboards.
引用
收藏
页码:3141 / 3153
页数:13
相关论文
共 50 条
  • [21] MASAD: A large-scale dataset for multimodal aspect-based sentiment analysis
    Zhou, Jie
    Zhao, Jiabao
    Huang, Jimmy Xiangji
    Hu, Qinmin Vivian
    He, Liang
    NEUROCOMPUTING, 2021, 455 : 47 - 58
  • [22] A Multimodal Analytics Platform for Journalists Analyzing Large-Scale, Heterogeneous Multilingual, and Multimedia Content
    Vrochidis, Stefanos
    Moumtzidou, Anastasia
    Gialampoukidis, Ilias
    Liparas, Dimitris
    Casamayor, Gerard
    Wanner, Leo
    Heise, Nicolaus
    Wagner, Tilman
    Bilous, Andriy
    Jamin, Emmanuel
    Simeonov, Boyan
    Alexiev, Vladimir
    Busch, Reinhard
    Arapakis, Ioannis
    Kompatsiaris, Ioannis
    FRONTIERS IN ROBOTICS AND AI, 2018, 5
  • [23] Socially CompliAnt Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation
    Karnan, Haresh
    Nair, Anirudh
    Xiao, Xuesu
    Warnell, Garrett
    Pirk, Soren
    Toshev, Alexander
    Hart, Justin
    Biswas, Joydeep
    Stone, Peter
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 11807 - 11814
  • [24] VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
    Wang, Xin
    Wu, Jiawei
    Chen, Junkun
    Li, Lei
    Wang, Yuan-Fang
    Wang, William Yang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4580 - 4590
  • [25] OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
    Zhao, Tiancheng
    Liu, Peng
    Lee, Kyusong
    IET COMPUTER VISION, 2024, 18 (05) : 626 - 639
  • [26] RGBD Fusion Grasp Network with Large-Scale Tableware Grasp Dataset
    Yoon, Jaemin
    Ahn, Joonmo
    Ha, Changsu
    Chung, Rakjoon
    Park, Dongwoo
    Han, Heungwoo
    Kang, Sungchul
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 2947 - 2954
  • [27] SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
    Abdrakhmanova, Madina
    Kuzdeuov, Askat
    Jarju, Sheikh
    Khassanov, Yerbolat
    Lewis, Michael
    Varol, Huseyin Atakan
    SENSORS, 2021, 21 (10)
  • [28] Training Convolutional Neural Network for Sketch Recognition on Large-Scale Dataset
    Zhou, Wen
    Jia, Jinyuan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (01) : 82 - 89
  • [29] Large-Scale Parallel Matching of Social Network Profiles
    Panchenko, Alexander
    Babaev, Dmitry
    Obiedkov, Sergei
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2015, 2015, 542 : 275 - 285
  • [30] Toward a large-scale multimodal event-based dataset for neuromorphic deep learning applications
    Leung, Sarah
    Shamwell, E. Jared
    Maxey, Christopher
    Nothwang, William D.
    MICRO- AND NANOTECHNOLOGY SENSORS, SYSTEMS, AND APPLICATIONS X, 2018, 10639