Dataset for multimodal fake news detection and verification tasks

被引:0
|
作者
Bondielli, Alessandro [1 ]
Dell'Oglio, Pietro [2 ]
Lenci, Alessandro [3 ]
Marcelloni, Francesco [2 ]
Passaro, Lucia [1 ]
机构
[1] Univ Pisa, Dept Comp Sci, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
[2] Univ Pisa, Dept Informat Engn, Largo Lucio Lazzarino 1, I-56122 Pisa, Italy
[3] Univ Pisa, Dept Philol Literature & Linguist, Via S Maria 36, I-56127 Pisa, Italy
来源
DATA IN BRIEF | 2024年 / 54卷
关键词
Fake news; Multimodal data; Data collection and annotation; Machine learning; Natural language processing;
D O I
10.1016/j.dib.2024.110440
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The proliferation of online disinformation and fake news, particularly in the context of breaking news events, demands the development of effective detection mechanisms. While textual content remains the predominant medium for disseminating misleading information, the contribution of other modalities is increasingly emerging within online outlets and social media platforms. However, multimodal datasets, which incorporate diverse modalities such as texts and images, are not very common yet, especially in low-resource languages. This study addresses this gap by releasing a dataset tailored for multimodal fake news detection in the Italian language. This dataset was originally employed in a shared task on the Italian language. The dataset is divided into two data subsets, each corresponding to a distinct sub-task. In sub-task 1, the goal is to assess the effectiveness of multimodal fake news detection systems. Sub-task 2 aims to delve into the interplay between text and images, specifically analyzing how these modalities mutually influence the interpretation of content when distinguishing between fake and real news. Both subtasks were managed as classification problems. The dataset consists of social media posts and news articles. After collecting it, it was labeled via crowdsourcing. Annotators were provided with external knowledge about the topic of the news to be labeled, enhancing their ability to discrim inate between fake and real news. The data subsets for subtask 1 and sub -task 2 consist of 913 and 1350 items, respectively, encompassing newspaper articles and tweets. (c) 2024 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multimodal Fake News Detection
    Segura-Bedmar, Isabel
    Alonso-Bartolome, Santiago
    [J]. INFORMATION, 2022, 13 (06)
  • [2] Multimodal Fake News Detection on Fakeddit Dataset Using Transformer-Based Architectures
    Kalra, Sakshi
    Kumar, Chitneedi Hemanth Sai
    Sharma, Yashvardhan
    Chauhan, Gajendra Singh
    [J]. MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 281 - 292
  • [3] IFND: a benchmark dataset for fake news detection
    Dilip Kumar Sharma
    Sonal Garg
    [J]. Complex & Intelligent Systems, 2023, 9 : 2843 - 2863
  • [4] IFND: a benchmark dataset for fake news detection
    Sharma, Dilip Kumar
    Garg, Sonal
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (03) : 2843 - 2863
  • [5] Fake News Detection Based on Multimodal Inputs
    Liang, Zhiping
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4519 - 4534
  • [6] r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
    Nakamura, Kai
    Levyn, Sharon
    Wang, William Yang
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6149 - 6157
  • [7] Improving Generalization for Multimodal Fake News Detection
    Tahmasebi, Sahar
    Hakimov, Sherzod
    Ewerth, Ralph
    Mueller-Budack, Eric
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 581 - 585
  • [8] LIMESODA: Dataset for Fake News Detection in Healthcare Domain
    Payoungkhamdee, Patomporn
    Porkaew, Peerachet
    Sinthunyathum, Atthasith
    Songphum, Phattharaphon
    Kawidam, Witsarut
    Loha-Udom, Wichayut
    Boonkwan, Prachya
    Sutantayawalee, Vipas
    [J]. 16TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2021), 2021,
  • [9] Fake News Detection with the New German Dataset "GermanFakeNC"
    Vogel, Inna
    Jiang, Peter
    [J]. DIGITAL LIBRARIES FOR OPEN KNOWLEDGE, TPDL 2019, 2019, 11799 : 288 - 295
  • [10] Multimodal Multi-image Fake News Detection
    Giachanou, Anastasia
    Zhang, Guobiao
    Rosso, Paolo
    [J]. 2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 647 - 654