"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

被引:490
|
作者
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
D O I
10.18653/v1/P17-2067
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic fake news detection is a challenging problem in deception detection, and it has tremendous real-world political and social impacts. However, statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. In this paper, we present LIAR: a new, publicly available dataset for fake news detection. We collected a decade-long, 12.8K manually labeled short statements in various contexts from POLITIFACT.COM, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. Empirically, we investigate automatic fake news detection based on surface-level linguistic patterns. We have designed a novel, hybrid convolutional neural network to integrate meta-data with text. We show that this hybrid approach can improve a text-only deep learning model.
引用
收藏
页码:422 / 426
页数:5
相关论文
共 50 条
  • [31] A systemic literature overview of Fake News Challenge (FNC-1) dataset and its use in fake news detection schemes
    Jawad, Zainab A.
    Obaid, Ahmed J.
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2023, 26 (04): : 1197 - 1206
  • [32] FakeNewsIndia: A benchmark dataset of fake news incidents in India, collection methodology and impact assessment in social media
    Dhawan, Apoorva
    Bhalla, Malvika
    Arora, Deeksha
    Kaushal, Rishabh
    Kumaraguru, Ponnurangam
    [J]. COMPUTER COMMUNICATIONS, 2022, 185 : 130 - 141
  • [33] Automatic Ground Truth Dataset Creation for Fake News Detection in Social Media
    Karidi, Danae Pla
    Nakos, Harry
    Stavrakas, Yannis
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 424 - 436
  • [34] A survey on the multiple classifier for new benchmark dataset of Vietnamese news classification
    Huu-Thanh Duong
    Vinh Truong Hoang
    [J]. 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2019, : 23 - 28
  • [35] Inclusive Study of Fake News Detection for COVID-19 with New Dataset using Supervised Learning Algorithms
    Qalaja, Emad K.
    Al-Haija, Qasem Abu
    Tareef, Afaf
    Al-Nabhan, Mohammad M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 1 - 12
  • [36] Ciron: a New Benchmark Dataset for Chinese Irony Detection
    Xiang, Rong
    Gao, Xuefeng
    Long, Yunfei
    Li, Anran
    Chersoni, Emmanuele
    Lu, Qin
    Huang, Chu-Ren
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5714 - 5720
  • [37] FakeRecogna: A New Brazilian Corpus for Fake News Detection
    Garcia, Gabriel L.
    Afonso, Luis C. S.
    Papa, Joao P.
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 57 - 67
  • [38] Fake News Detection Methods: A Survey and New Perspectives
    Hamida, Zineb Ferhat
    Refoufi, Allaoua
    Drif, Ahlem
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 123 - 141
  • [39] Stance Detection in the Context of Fake News—A New Approach †
    Alsmadi, Izzat
    Alazzam, Iyad
    Al-Ramahi, Mohammad
    Zarour, Mohammad
    [J]. Future Internet, 2024, 16 (10):
  • [40] Detection of fake news in a new corpus for the Spanish language
    Posadas-Duran, Juan-Pablo
    Gomez-Adorno, Helena
    Sidorov, Grigori
    Moreno Escobar, Jesus Jaime
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4869 - 4876