Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

被引：0

作者：

Yin, Wenpeng ^{[1
]}

Hay, Jamaal ^{[1
]}

Roth, Dan ^{[1
]}

机构：

[1] Univ Penn, Dept Comp & Informat Sci, Cognit Computat Grp, Philadelphia, PA 19104 USA

来源：

2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-shot text classification (0SHOT-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0SHOT- TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0SHOT-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0SHOT-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0SHOT-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0SHOT- TC relative to conceptually different and diverse aspects: the "topic" aspect includes "sports" and "politics" as labels; the "emotion" aspect includes "joy" and "anger"; the "situation" aspect includes "medical assistance" and "water shortage". ii) We extend the existing evaluation setup (labelpartially-unseen) - given a dataset, train on some labels, test on all labels - to include a more challenging yet realistic evaluation label-fully-unseen 0SHOT- TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0SHOT- TC of diverse aspects within a textual entailment formulation and study it this way. (1)

引用

页码：3914 / 3923

页数：10

共 50 条

[1] A weakly supervised textual entailment approach to zero-shot text classification
Pamies, Marc
Llop, Joan
Multari, Francesco
Duran-Silva, Nicolau
Parra-Rojas, Cesar
Gonzalez-Agirre, Aitor
Massucci, Francesco Alessandro
Villegas, Marta
[J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 286 - 296
[2] Issues with Entailment-based Zero-shot Text Classification
Ma, Tingting
Yao, Jin-Ge
Lin, Chin-Yew
Zhao, Tiejun
[J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 786 - 796
[3] Zero-Shot Text Classification with Semantically Extended Textual Entailment
Liu, Tengfei
Hu, Yongli
Chen, Puman
Sun, Yanfeng
Yin, Baocai
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[4] Generalised Zero-shot Learning for Entailment-based Text Classification with Externa Knowledge
Wang, Yuqi
Wang, Wei
Chen, Qi
Huang, Kaizhu
Anh Nguyen
De, Suparna
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2022), 2022, : 19 - 25
[5] Zero-Shot Turkish Text Classification
Birim, Ahmet
Erden, Mustafa
Arslan, Levent M.
[J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[6] Zero-shot Topic Classification via Automatic Tagging on Chinese Text Datasets
Cai, Xinyi
Tian, Jiao
Yu, Ke
Xiao, Hongwang
Zhang, Kai
Tsai, Pei -Wei
[J]. 2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 482 - 488
[7] Retrieval Augmented Zero-Shot Text Classification
Abdullahi, Tassallah
Singh, Ritambhara
Eickhoff, Carsten
[J]. PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 195 - 203
[8] Extreme Zero-Shot Learning for Extreme Text Classification
Xiong, Yuanhao
Chang, Wei-Cheng
Hsieh, Cho-Jui
Yu, Hsiang-Fu
Dhillon, Inderjit
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5455 - 5468
[9] Unified benchmark for zero-shot Turkish text classification
celik, Emrecan
Dalyan, Tugba
[J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[10] Learn to Adapt for Generalized Zero-Shot Text Classification
Zhang, Yiwen
Yuan, Caixia
Wang, Xiaojie
Bai, Ziwei
Liu, Yongbin
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 517 - 527

← 1 2 3 4 5 →