Unified benchmark for zero-shot Turkish text classification

被引：3

作者：

celik, Emrecan ^{[1
]}

Dalyan, Tugba ^{[1
]}

机构：

[1] Istanbul Bilgi Univ, Dept Comp Engn, Eski Silahtaraga Elekt Santrali Kazim Karabekir Ca, TR-34060 Istanbul, Turkiye

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 03期

关键词：

Text classification; Zero-shot learning; Next sentence prediction; Natural language inference; Masked language modeling; DATASET;

D O I：

10.1016/j.ipm.2023.103298

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zeroshot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.

引用

页数：14

共 50 条

[1] Zero-Shot Turkish Text Classification
Birim, Ahmet
Erden, Mustafa
Arslan, Levent M.
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[2] Retrieval Augmented Zero-Shot Text Classification
Abdullahi, Tassallah
Singh, Ritambhara
Eickhoff, Carsten
PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 195 - 203
[3] Extreme Zero-Shot Learning for Extreme Text Classification
Xiong, Yuanhao
Chang, Wei-Cheng
Hsieh, Cho-Jui
Yu, Hsiang-Fu
Dhillon, Inderjit
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5455 - 5468
[4] Learn to Adapt for Generalized Zero-Shot Text Classification
Zhang, Yiwen
Yuan, Caixia
Wang, Xiaojie
Bai, Ziwei
Liu, Yongbin
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 517 - 527
[5] Generalized Zero-Shot Text Classification for ICD Coding
Song, Congzheng
Zhang, Shanghang
Sadoughi, Najmeh
Xie, Pengtao
Xing, Eric
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4018 - 4024
[6] Zero-Shot Information Extraction as a Unified Text-to-Triple Translation
Wang, Chenguang
Liu, Xiao
Chen, Zui
Hong, Haoyun
Tang, Jie
Song, Dawn
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1225 - 1238
[7] Issues with Entailment-based Zero-shot Text Classification
Ma, Tingting
Yao, Jin-Ge
Lin, Chin-Yew
Zhao, Tiejun
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 786 - 796
[8] ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
Alcoforado, Alexandre
Ferraz, Thomas Palmeira
Gerber, Rodrigo
Bustos, Enzo
Oliveira, Andre Seidel
Veloso, Bruno Miguel
Siqueira, Fabio Levy
Reali Costa, Anna Helena
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 125 - 136
[9] Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
Zhang, Jingqing
Lertvittayakumjorn, Piyawat
Guo, Yike
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1031 - 1040
[10] Zero-Shot Text Classification with Semantically Extended Textual Entailment
Liu, Tengfei
Hu, Yongli
Chen, Puman
Sun, Yanfeng
Yin, Baocai
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,

← 1 2 3 4 5 →