Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

被引:0
|
作者
Yin, Wenpeng [1 ]
Hay, Jamaal [1 ]
Roth, Dan [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Cognit Computat Grp, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot text classification (0SHOT-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0SHOT- TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0SHOT-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0SHOT-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0SHOT-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0SHOT- TC relative to conceptually different and diverse aspects: the "topic" aspect includes "sports" and "politics" as labels; the "emotion" aspect includes "joy" and "anger"; the "situation" aspect includes "medical assistance" and "water shortage". ii) We extend the existing evaluation setup (labelpartially-unseen) - given a dataset, train on some labels, test on all labels - to include a more challenging yet realistic evaluation label-fully-unseen 0SHOT- TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0SHOT- TC of diverse aspects within a textual entailment formulation and study it this way. (1)
引用
收藏
页码:3914 / 3923
页数:10
相关论文
共 50 条
  • [1] A weakly supervised textual entailment approach to zero-shot text classification
    Pamies, Marc
    Llop, Joan
    Multari, Francesco
    Duran-Silva, Nicolau
    Parra-Rojas, Cesar
    Gonzalez-Agirre, Aitor
    Massucci, Francesco Alessandro
    Villegas, Marta
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 286 - 296
  • [2] Issues with Entailment-based Zero-shot Text Classification
    Ma, Tingting
    Yao, Jin-Ge
    Lin, Chin-Yew
    Zhao, Tiejun
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 786 - 796
  • [3] Zero-Shot Text Classification with Semantically Extended Textual Entailment
    Liu, Tengfei
    Hu, Yongli
    Chen, Puman
    Sun, Yanfeng
    Yin, Baocai
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] Generalised Zero-shot Learning for Entailment-based Text Classification with Externa Knowledge
    Wang, Yuqi
    Wang, Wei
    Chen, Qi
    Huang, Kaizhu
    Anh Nguyen
    De, Suparna
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2022), 2022, : 19 - 25
  • [5] Zero-Shot Turkish Text Classification
    Birim, Ahmet
    Erden, Mustafa
    Arslan, Levent M.
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [6] Zero-shot Topic Classification via Automatic Tagging on Chinese Text Datasets
    Cai, Xinyi
    Tian, Jiao
    Yu, Ke
    Xiao, Hongwang
    Zhang, Kai
    Tsai, Pei -Wei
    [J]. 2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 482 - 488
  • [7] Retrieval Augmented Zero-Shot Text Classification
    Abdullahi, Tassallah
    Singh, Ritambhara
    Eickhoff, Carsten
    [J]. PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 195 - 203
  • [8] Extreme Zero-Shot Learning for Extreme Text Classification
    Xiong, Yuanhao
    Chang, Wei-Cheng
    Hsieh, Cho-Jui
    Yu, Hsiang-Fu
    Dhillon, Inderjit
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5455 - 5468
  • [9] Unified benchmark for zero-shot Turkish text classification
    celik, Emrecan
    Dalyan, Tugba
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [10] Learn to Adapt for Generalized Zero-Shot Text Classification
    Zhang, Yiwen
    Yuan, Caixia
    Wang, Xiaojie
    Bai, Ziwei
    Liu, Yongbin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 517 - 527