Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification

被引:1
|
作者
Moreo, Alejandro [1 ]
Pedrotti, Andrea [1 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Via Giuseppe Moruzzi 1, I-56124 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Transfer learning; heterogeneous transfer learning; cross-lingual text classification; ensemble learning; word embeddings; REPRESENTATION;
D O I
10.1145/3544104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Funnelling (FUN) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives FUN an edge over CLTC systems in which these correlations cannot be brought to bear. In this article, we describe Generalized FUNnelling (GFUN), a generalization of FUN consisting of an HTL architecture in which 1st-tier components can be arbitrary view-generating FUNctions, i.e., language-dependent FUNctions that each produce a language-independent representation ("view") of the (monolingual) document. We describe an instance of GFUN in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in FUN) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings), word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings), and word-context correlations (as encoded by multilingual BERT). We show that this instance of GFUN substantially improves over FUN and over state-of-the-art baselines by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements GFUN is publicly available.
引用
收藏
页数:37
相关论文
共 50 条
  • [1] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [2] Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and Its Application to Cross-Lingual Text Classification
    Esuli, Andrea
    Moreo, Alejandro
    Sebastiani, Fabrizio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (03)
  • [3] Cross-Lingual Text Classification with Model Translation and Document Translation
    Moh, Teng-Sheng
    Zhang, Zhang
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [4] Transductive Representation Learning for Cross-Lingual Text Classification
    Guo, Yuhong
    Xiao, Min
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 888 - 893
  • [5] Cross-lingual Text Classification with Heterogeneous Graph Neural Network
    Wang, Ziyun
    Liu, Xuan
    Yang, Peiji
    Liu, Shixing
    Wang, Zhisheng
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 612 - 620
  • [6] Cross-lingual Short-Text Document Classification for Facebook Comments
    Faqeeh, Mosab
    Abdulla, Nawaf
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Quwaider, Muhannad
    2014 INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD), 2014, : 573 - 578
  • [7] Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
    Zhou, Xinjie
    Wan, Xianjun
    Xiao, Jianguo
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1403 - 1412
  • [8] Cross-lingual Distillation for Text Classification
    Xu, Ruochen
    Yang, Yiming
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1415 - 1425
  • [9] Multilingual and cross-lingual document classification: A meta-learning approach
    van der Heijden, Niels
    Yannakoudakis, Helen
    Mishra, Pushkar
    Shutova, Ekaterina
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1966 - 1976
  • [10] Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
    Chairatanakul, Nuttapong
    Sriwatanasakdi, Noppayut
    Charoenphakdee, Nontawat
    Liu, Xin
    Murata, Tsuyoshi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1504 - 1517