A Robust Self-Learning Framework for Cross-Lingual Text Classification

被引:0
|
作者
Dong, Xin [1 ]
de Melo, Gerard [1 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Based on massive amounts of data, recent pretrained contextual representation models have made significant strides in advancing a number of different English NLP tasks. However, for other languages, relevant training data may be lacking, while state-of-the-art deep learning methods are known to be data-hungry. In this paper, we present an elegantly simple robust self-learning framework to include unlabeled non-English samples in the fine-tuning process of pretrained multilingual representation models. We leverage a multilingual model's own predictions on unlabeled nonEnglish data in order to obtain additional information that can be used during further finetuning. Compared with original multilingual models and other cross-lingual classification models, we observe significant gains in effectiveness on document and sentiment classification for a range of diverse languages.
引用
收藏
页码:6306 / 6310
页数:5
相关论文
共 50 条
  • [1] Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification
    Dong, Xin
    Zhu, Yaxin
    Zhang, Yupeng
    Fu, Zuohui
    Xu, Dongkuan
    Yang, Sen
    de Melo, Gerard
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1541 - 1544
  • [2] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798
  • [3] Transductive Representation Learning for Cross-Lingual Text Classification
    Guo, Yuhong
    Xiao, Min
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 888 - 893
  • [4] Cross-lingual Distillation for Text Classification
    Xu, Ruochen
    Yang, Yiming
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1415 - 1425
  • [5] Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation
    Xu, Liyan
    Zhang, Xuchao
    Zhao, Xujiang
    Chen, Haifeng
    Chen, Feng
    Choi, Jinho D.
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6716 - 6723
  • [6] Prompt-based learning framework for zero-shot cross-lingual text classification
    Feng, Kai
    Huang, Lan
    Wang, Kangping
    Wei, Wei
    Zhang, Rui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [7] Cross-lingual learning for text processing: A survey
    Pikuliak, Matus
    Simko, Marian
    Bielikova, Maria
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 165
  • [8] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    [J]. 36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [9] Active Learning for Cross-Lingual Sentiment Classification
    Li, Shoushan
    Wang, Rong
    Liu, Huanhuan
    Huang, Chu-Ren
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 236 - 246
  • [10] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    [J]. APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174