Cross-Lingual Transfer Learning for Complex Word Identification

被引:3
|
作者
Zaharia, George-Eduard [1 ]
Cercel, Dumitru-Clementin [1 ]
Dascalu, Mihai [1 ]
机构
[1] Univ Politehn Bucuresti, Comp Sci Dept, Bucharest, Romania
关键词
Complex Word Identification; Transformer; Cross-Lingual Transfer Learning;
D O I
10.1109/ICTAI50040.2020.00067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techniques, alongside state-of-the-art solutions for Natural Language Processing (NLP) tasks (i.e., Transformers). Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i.e., English, German, Spanish, and also French). Our approach surpasses state-of-the-art cross-lingual results in terms of macro F1-score on English (0.774), German (0.782), and Spanish (0.734) languages, for the zero-shot learning scenario. At the same time, our model also outperforms the state-of-the-art monolingual result for German (0.795 macro F1-score).
引用
收藏
页码:384 / 390
页数:7
相关论文
共 50 条
  • [1] Unsupervised Cross-lingual Transfer of Word Embedding Spaces
    Xu, Ruochen
    Yang, Yiming
    Otani, Naoki
    Wu, Yuexin
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2465 - 2474
  • [2] Cross-Lingual Word Embeddings
    Søgaard, Anders
    Vulić, Ivan
    Ruder, Sebastian
    Faruqui, Manaal
    [J]. Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [3] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [4] Cross-Lingual Word Embeddings
    Agirre, Eneko
    [J]. COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [5] Choosing Transfer Languages for Cross-Lingual Learning
    Lin, Yu-Hsiang
    Chen, Chian-Yu
    Lee, Jean
    Li, Zirui
    Zhang, Yuyan
    Xia, Mengzhou
    Rijhwani, Shruti
    He, Junxian
    Zhang, Zhisong
    Ma, Xuezhe
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3125 - 3135
  • [6] Translation Artifacts in Cross-lingual Transfer Learning
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7674 - 7684
  • [7] Multi-Adversarial Learning for Cross-Lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 463 - 472
  • [8] Cross-Lingual Transfer for Hindi Discourse Relation Identification
    Dahiya, Anirudh
    Shrivastava, Manish
    Sharma, Dipti Misra
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 240 - 247
  • [9] Learning Tibetan-Chinese cross-lingual word embeddings
    Ma, Wei
    Yu, Hongzhi
    Zhao, Kun
    Zhao, Deshun
    [J]. 2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 49 - 53
  • [10] Unsupervised cross-lingual word embeddings learning with adversarial training
    Li, Yuling
    Zhang, Yuhong
    Li, Peipei
    Hu, Xuegang
    [J]. 2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 150 - 156