Cross-lingual transfer learning: A PARAFAC2 approach

被引:1
|
作者
Pantraki, Evangelia [1 ]
Tsingalis, Ioannis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
关键词
PARAFAC2; Cross-lingual transfer learning; Cross-lingual document classification; Cross-lingual authorship attribution; Language processing; EMBEDDINGS;
D O I
10.1016/j.patrec.2022.05.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proposed framework addresses the problem of cross-lingual transfer learning resorting to Parallel Factor Analysis 2 (PARAFAC2). To avoid the need for multilingual parallel corpora, a pairwise setting is adopted where a PARAFAC2 model is fitted to documents written in English (source language) and a different target language. Firstly, an unsupervised PARAFAC2 model is fitted to parallel unlabelled corpora pairs to learn the latent relationship between the source and target language. The fitted model is used to create embeddings for a text classification task (document classification or authorship attribution). Subsequently, a logistic regression classifier is fitted to the training source language embeddings and tested on the training target language embeddings. Following the zero-shot setting, no labels are exploited for the target language documents. The proposed framework incorporates a self-learning process by utilizing the predicted labels as pseudo-labels to train a new, pseudo-supervised PARAFAC2 model, which aims to extract latent class-specific information while fusing language-specific information. Thorough evaluation is conducted on cross-lingual document classification and cross-lingual authorship attribution. Remarkably, the proposed framework achieves competitive results when compared to deep learning methods in cross-lingual transfer learning tasks. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:167 / 173
页数:7
相关论文
共 50 条
  • [1] Translation Artifacts in Cross-lingual Transfer Learning
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7674 - 7684
  • [2] Choosing Transfer Languages for Cross-Lingual Learning
    Lin, Yu-Hsiang
    Chen, Chian-Yu
    Lee, Jean
    Li, Zirui
    Zhang, Yuyan
    Xia, Mengzhou
    Rijhwani, Shruti
    He, Junxian
    Zhang, Zhisong
    Ma, Xuezhe
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3125 - 3135
  • [3] Cross-Lingual Transfer Learning Framework for Program Analysis
    Li, Zhiming
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1074 - 1078
  • [4] Cross-Lingual Transfer Learning for Statistical Type Inference
    Li, Zhiming
    Xie, Xiaofei
    Li, Haoliang
    Xu, Zhengzi
    Li, Yi
    Liu, Yang
    PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 239 - 250
  • [5] On the Role of Parallel Data in Cross-lingual Transfer Learning
    Reid, Machel
    Artetxe, Mikel
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5999 - 6006
  • [6] CROSS-LINGUAL TRANSFER LEARNING FOR SPOKEN LANGUAGE UNDERSTANDING
    Quynh Ngoc Thi Do
    Gaspers, Judith
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5956 - 5960
  • [7] Cross-Lingual Transfer Learning for Complex Word Identification
    Zaharia, George-Eduard
    Cercel, Dumitru-Clementin
    Dascalu, Mihai
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 384 - 390
  • [8] Cross-lingual Continual Learning
    M'hamdi, Meryem
    Ren, Xiang
    May, Jonathan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3908 - 3943
  • [9] Cross-lingual Transfer Learning and Multitask Learning for Capturing Multiword Expressions
    Taslimipoor, Shiva
    Rohanian, Omid
    Ha, Le An
    JOINT WORKSHOP ON MULTIWORD EXPRESSIONS AND WORDNET (MWE-WN 2019), 2019, : 155 - 161
  • [10] UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages
    Trinh Pham
    Le, Khoi M.
    Luu Anh Tuan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3168 - 3184