Is my stance the same as your stance? A cross validation study of stance detection datasets

被引:0
|
作者
Ng, Lynnette Hui Xian [1 ]
Carley, Kathleen M. [1 ]
机构
[1] Carnegie Mellon Univ, CASOS, Inst Software Res, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
Stance detection; Natural language processing; Cross validation; Machine learning; Twitter;
D O I
10.1016/j.ipm.2022.103070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stance detection identifies a person's evaluation of a subject, and is a crucial component for many downstream applications. In application, stance detection requires training a machine learning model on an annotated dataset and applying the model on another to predict stances of text snippets. This cross-dataset model generalization poses three central questions, which we investigate using stance classification models on 7 publicly available English Twitter datasets ranging from 297 to 48,284 instances. (1) Are stance classification models generalizable across datasets? We construct a single dataset model to train/test dataset-against-dataset, finding models do not generalize well (avg F1=0.33). (2) Can we improve the generalizability by aggregating datasets? We find a multi dataset model built on the aggregation of datasets has an improved performance (avg F1=0.69). (3) Given a model built on multiple datasets, how much additional data is required to fine-tune it? We find it challenging to ascertain a minimum number of data points due to the lack of pattern in performance. Investigating possible reasons for the choppy model performance we find that texts are not easily differentiable by stances, nor are annotations consistent within and across datasets. Our observations emphasize the need for an aggregated dataset as well as consistent labels for the generalizability of models.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Is my stance the same as your stance? A cross validation study of stance detection datasets
    Ng, Lynnette Hui Xian
    Carley, Kathleen M.
    [J]. Information Processing and Management, 2022, 59 (06):
  • [2] Stance Detection Benchmark: How Robust is Your Stance Detection?
    Benjamin Schiller
    Johannes Daxenberger
    Iryna Gurevych
    [J]. KI - Künstliche Intelligenz, 2021, 35 : 329 - 341
  • [3] Stance Detection Benchmark: How Robust is Your Stance Detection?
    Schiller, Benjamin
    Daxenberger, Johannes
    Gurevych, Iryna
    [J]. KUNSTLICHE INTELLIGENZ, 2021, 35 (3-4): : 329 - 341
  • [4] Your stance is exposed! Analysing possible factors for stance detection on social media
    Aldayel, Abeer
    Magdy, Walid
    [J]. Proceedings of the ACM on Human-Computer Interaction, 2019, 3 (CSCW):
  • [5] Stance detection in Arabic with a multi-dialectal cross-domain stance corpus
    Charfi, Anis
    Bessghaier, Mabrouka
    Atalla, Andria
    Akasheh, Raghda
    Al-Emadi, Sara
    Zaghouani, Wajdi
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [6] Exploring the impact of training datasets on Turkish stance detection
    Zengin, Muhammed Said
    Yenisey, Berk Utku
    Kutlu, Mucahid
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2023, 31 (07) : 1206 - 1222
  • [7] My stance in philosophy of religion
    Schellenberg, J. L.
    [J]. RELIGIOUS STUDIES, 2013, 49 (02) : 143 - 150
  • [8] Guiding Computational Stance Detection with Expanded Stance Triangle Framework
    Liu, Zhengyuan
    Yap, Yong Keong
    Chieu, Hai Leong
    Chen, Nancy F.
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3987 - 4001
  • [9] Multi-Task Stance Detection with Sentiment and Stance Lexicons
    Li, Yingjie
    Caragea, Cornelia
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6299 - 6305
  • [10] Stance Detection: A Survey
    Kucuk, Dilek
    Can, Fazli
    [J]. ACM COMPUTING SURVEYS, 2020, 53 (01)