Cross-lingual Distillation for Text Classification

被引:23
|
作者
Xu, Ruochen [1 ]
Yang, Yiming [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
10.18653/v1/P17-1130
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-lingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. This paper presents a novel approach to CLTC that builds on model distillation, which adapts and extends a framework originally proposed for model compression. Using soft probabilistic predictions for the documents in a label-rich language as the (induced) supervisory labels in a parallel corpus of documents, we train classifiers successfully for new languages in which labeled training data are not available. An adversarial feature adaptation technique is also applied during the model training to reduce distribution mismatch. We conducted experiments on two benchmark CLTC datasets, treating English as the source language and German, French, Japan and Chinese as the unlabeled target languages. The proposed approach had the advantageous or comparable performance of the other state-of-art methods.
引用
收藏
页码:1415 / 1425
页数:11
相关论文
共 50 条
  • [41] Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
    Moritz, Maria
    Steding, David
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1976 - 1980
  • [42] Prompt-based learning framework for zero-shot cross-lingual text classification
    Feng, Kai
    Huang, Lan
    Wang, Kangping
    Wei, Wei
    Zhang, Rui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [43] Semantic Space Transformations for Cross-Lingual Document Classification
    Martinek, Jiri
    Lenc, Ladislav
    Kral, Pavel
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 608 - 616
  • [44] Monolingual, multilingual and cross-lingual code comment classification
    Kostic, Marija
    Batanovic, Vuk
    Nikolic, Bosko
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [45] Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text
    Rettinger, Achim
    Schumilin, Artem
    Thoma, Steffen
    Ell, Basil
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, ESWC 2015, 2015, 9088 : 337 - 352
  • [46] Bleaching Text: Abstract Features for Cross-lingual Gender Prediction
    van der Goot, Rob
    Ljubesic, Nikola
    Matroos, Ian
    Nissim, Malvina
    Plank, Barbara
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 383 - 389
  • [47] Data Quality Controlling for Cross-Lingual Sentiment Classification
    Li, Shoushan
    Xue, Yunxia
    Wang, Zhongqing
    Lee, Sophia Yat Mei
    Huang, Chu-Ren
    [J]. 2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, : 125 - 128
  • [48] SpeakerNet for Cross-lingual Text-Independent Speaker Verification
    Habib, Hafsa
    Tauseef, Huma
    Fahiem, Muhammad Abuzar
    Farhan, Saima
    Usman, Ghousia
    [J]. ARCHIVES OF ACOUSTICS, 2020, 45 (04) : 573 - 583
  • [49] Exploring Neural Translation Models for Cross-Lingual Text Similarity
    Seki, Kazuhiro
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1591 - 1594
  • [50] CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset
    Chen, Zheng
    Lin, Hongyu
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6932 - 6937