Zero-shot Topic Classification via Automatic Tagging on Chinese Text Datasets

被引:0
|
作者
Cai, Xinyi [1 ]
Tian, Jiao [1 ]
Yu, Ke [1 ]
Xiao, Hongwang [2 ]
Zhang, Kai [1 ]
Tsai, Pei -Wei [1 ]
机构
[1] Swinburne Univ Technol, Melbourne, Australia
[2] Beijing Acad Artificial Intelligence BAAD, Beijing, Peoples R China
关键词
Topic Classification; Data Scarcity; Zero-shot Learning; Transformer-based Structure; Automatic Tagging; Chinese Text Datasets;
D O I
10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data scarcity problem is often encountered for topic classification in many real-world applications. Zero-shot classification aims to deal with this problem by conducting a classification without any previously labelled data. However, only a few studies work on zero-shot topic classification on Chinese text. In this paper, we focus on providing an automatic tagging structure for zero-shot topic classification, which adopts labelled data for training based on a transformer-based model from external corpuses. Moreover, we show the effectiveness of fine-tuning large dataset in a downstream task, where the training data labels are not aligned with the test data labels in advance. Our experiments shows that the results outperform the performance of the benchmark approaches on two standard Chinese text datasets for the zero-shot setting.
引用
收藏
页码:482 / 488
页数:7
相关论文
共 50 条
  • [1] ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
    Alcoforado, Alexandre
    Ferraz, Thomas Palmeira
    Gerber, Rodrigo
    Bustos, Enzo
    Oliveira, Andre Seidel
    Veloso, Bruno Miguel
    Siqueira, Fabio Levy
    Reali Costa, Anna Helena
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 125 - 136
  • [2] Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach
    Yin, Wenpeng
    Hay, Jamaal
    Roth, Dan
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3914 - 3923
  • [3] Zero-Shot Turkish Text Classification
    Birim, Ahmet
    Erden, Mustafa
    Arslan, Levent M.
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [4] Zero-Shot Topic Labeling for Hazard Classification
    Rondinelli, Andrea
    Bongiovanni, Lorenzo
    Basile, Valerio
    [J]. INFORMATION, 2022, 13 (10)
  • [5] Topic Classification of Key Audit Matters in Japanese Audit Reports by Zero-shot Text Classification
    Doi, Nobushige
    Nobuta, Yusuke
    Mizuno, Takeshi
    [J]. Proceedings - 2023 14th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2023, 2023, : 540 - 545
  • [6] Retrieval Augmented Zero-Shot Text Classification
    Abdullahi, Tassallah
    Singh, Ritambhara
    Eickhoff, Carsten
    [J]. PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 195 - 203
  • [7] Zero-Shot Chinese Text Recognition via Matching Class Embedding
    Huang, Yuhao
    Jin, Lianwen
    Peng, Dezhi
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 127 - 141
  • [8] Zero-shot Text Classification via Reinforced Self-training
    Ye, Zhiquan
    Geng, Yuxia
    Chen, Jiaoyan
    Xu, Xiaoxiao
    Zheng, Suhang
    Wang, Feng
    Chen, Jingmin
    Zhang, Jun
    Chen, Huajun
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3014 - 3024
  • [9] Unified benchmark for zero-shot Turkish text classification
    celik, Emrecan
    Dalyan, Tugba
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [10] Extreme Zero-Shot Learning for Extreme Text Classification
    Xiong, Yuanhao
    Chang, Wei-Cheng
    Hsieh, Cho-Jui
    Yu, Hsiang-Fu
    Dhillon, Inderjit
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5455 - 5468