Cross-lingual Distillation for Text Classification

被引:23
|
作者
Xu, Ruochen [1 ]
Yang, Yiming [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
10.18653/v1/P17-1130
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-lingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. This paper presents a novel approach to CLTC that builds on model distillation, which adapts and extends a framework originally proposed for model compression. Using soft probabilistic predictions for the documents in a label-rich language as the (induced) supervisory labels in a parallel corpus of documents, we train classifiers successfully for new languages in which labeled training data are not available. An adversarial feature adaptation technique is also applied during the model training to reduce distribution mismatch. We conducted experiments on two benchmark CLTC datasets, treating English as the source language and German, French, Japan and Chinese as the unlabeled target languages. The proposed approach had the advantageous or comparable performance of the other state-of-art methods.
引用
收藏
页码:1415 / 1425
页数:11
相关论文
共 50 条
  • [21] Cross-lingual Text Clustering in a Large System
    Schneider, Nicole R.
    Sankaranarayanan, Jagan
    Samet, Hanan
    [J]. PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 1 - 11
  • [22] On cross-lingual retrieval with multilingual text encoders
    Robert Litschko
    Ivan Vulić
    Simone Paolo Ponzetto
    Goran Glavaš
    [J]. Information Retrieval Journal, 2022, 25 : 149 - 183
  • [23] Cross-lingual learning for text processing: A survey
    Pikuliak, Matus
    Simko, Marian
    Bielikova, Maria
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 165
  • [24] Cross-lingual text filtering based on text concepts and kNN
    Li, SZ
    Su, WF
    Li, TQ
    Chen, HW
    [J]. PACLIC 17: Language, Information and Computation, Proceedings, 2003, : 166 - 173
  • [25] Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting
    Li, Irene
    Sen, Prithviraj
    Zhu, Huaiyu
    Li, Yunyao
    Radev, Dragomir
    [J]. REPL4NLP 2021: PROCEEDINGS OF THE 6TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2021, : 1 - 7
  • [26] An Integrated Topic Modelling and Graph Neural Network for Improving Cross-lingual Text Classification
    Tham Vo
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (01)
  • [27] Ontology-supported text classification based on cross-lingual word sense disambiguation
    Tufis, Dan
    Koeva, Svetla
    [J]. APPLICATIONS OF FUZZY SETS THEORY, 2007, 4578 : 447 - +
  • [28] A Comparative Study of Cross-Lingual Sentiment Classification
    Wan, Xiaojun
    [J]. 2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 24 - 31
  • [29] Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (02)
  • [30] Cross-lingual sentiment classification with stacked autoencoders
    Guangyou Zhou
    Zhiyuan Zhu
    Tingting He
    Xiaohua Tony Hu
    [J]. Knowledge and Information Systems, 2016, 47 : 27 - 44