Unified benchmark for zero-shot Turkish text classification

被引:3
|
作者
celik, Emrecan [1 ]
Dalyan, Tugba [1 ]
机构
[1] Istanbul Bilgi Univ, Dept Comp Engn, Eski Silahtaraga Elekt Santrali Kazim Karabekir Ca, TR-34060 Istanbul, Turkiye
关键词
Text classification; Zero-shot learning; Next sentence prediction; Natural language inference; Masked language modeling; DATASET;
D O I
10.1016/j.ipm.2023.103298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zeroshot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Zero-Shot Turkish Text Classification
    Birim, Ahmet
    Erden, Mustafa
    Arslan, Levent M.
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [2] Retrieval Augmented Zero-Shot Text Classification
    Abdullahi, Tassallah
    Singh, Ritambhara
    Eickhoff, Carsten
    [J]. PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 195 - 203
  • [3] Extreme Zero-Shot Learning for Extreme Text Classification
    Xiong, Yuanhao
    Chang, Wei-Cheng
    Hsieh, Cho-Jui
    Yu, Hsiang-Fu
    Dhillon, Inderjit
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5455 - 5468
  • [4] Learn to Adapt for Generalized Zero-Shot Text Classification
    Zhang, Yiwen
    Yuan, Caixia
    Wang, Xiaojie
    Bai, Ziwei
    Liu, Yongbin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 517 - 527
  • [5] Generalized Zero-Shot Text Classification for ICD Coding
    Song, Congzheng
    Zhang, Shanghang
    Sadoughi, Najmeh
    Xie, Pengtao
    Xing, Eric
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4018 - 4024
  • [6] Zero-Shot Information Extraction as a Unified Text-to-Triple Translation
    Wang, Chenguang
    Liu, Xiao
    Chen, Zui
    Hong, Haoyun
    Tang, Jie
    Song, Dawn
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1225 - 1238
  • [7] Issues with Entailment-based Zero-shot Text Classification
    Ma, Tingting
    Yao, Jin-Ge
    Lin, Chin-Yew
    Zhao, Tiejun
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 786 - 796
  • [8] ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
    Alcoforado, Alexandre
    Ferraz, Thomas Palmeira
    Gerber, Rodrigo
    Bustos, Enzo
    Oliveira, Andre Seidel
    Veloso, Bruno Miguel
    Siqueira, Fabio Levy
    Reali Costa, Anna Helena
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 125 - 136
  • [9] Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
    Zhang, Jingqing
    Lertvittayakumjorn, Piyawat
    Guo, Yike
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1031 - 1040
  • [10] Zero-Shot Text Classification with Semantically Extended Textual Entailment
    Liu, Tengfei
    Hu, Yongli
    Chen, Puman
    Sun, Yanfeng
    Yin, Baocai
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,