Semantic Space Transformations for Cross-Lingual Document Classification

被引:1
|
作者
Martinek, Jiri [1 ]
Lenc, Ladislav [2 ]
Kral, Pavel [1 ,2 ]
机构
[1] Univ West Bohemia, Fac Sci Appl, Dept Comp Sci & Engn, Plzen, Czech Republic
[2] Univ West Bohemia, Fac Sci Appl, NTIS, Plzen, Czech Republic
关键词
D O I
10.1007/978-3-030-01418-6_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual document representation can be done by training monolingual semantic spaces and then to use bilingual dictionaries with some transform method to project word vectors into a unified space. The main goal of this paper consists in evaluation of three promising transform methods on cross-lingual document classification task. We also propose, evaluate and compare two cross-lingual document classification approaches. We use popular convolutional neural network (CNN) and compare its performance with a standard maximum entropy classifier. The proposed methods are evaluated on four languages, namely English, German, Spanish and Italian from the Reuters corpus. We demonstrate that the results of all transformation methods are close to each other, however the orthogonal transformation gives generally slightly better results when CNN with trained embeddings is used. The experimental results also show that convolutional network achieves better results than maximum entropy classifier. We further show that the proposed methods are competitive with the state of the art.
引用
收藏
页码:608 / 616
页数:9
相关论文
共 50 条
  • [1] Linear transformations for cross-lingual semantic textual similarity
    Brychcin, Tomas
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 187
  • [2] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    [J]. 36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [3] Kernel Least Squares Transformations for Cross-Lingual Semantic Spaces
    Mistera, Adam
    Brychcin, Tomas
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT I, 2024, 15048 : 227 - 238
  • [4] Cross-Lingual Document Similarity
    Muhic, Andrej
    Rupnik, Jan
    Skraba, Primoz
    [J]. PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 387 - 392
  • [5] Cross-lingual document clustering
    Wu, Ke
    Lu, Bao-Liang
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 956 - +
  • [6] Cross-Lingual Text Classification with Model Translation and Document Translation
    Moh, Teng-Sheng
    Zhang, Zhang
    [J]. PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [7] Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
    Zhou, Xinjie
    Wan, Xianjun
    Xiao, Jianguo
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1403 - 1412
  • [8] Cross-lingual Short-Text Document Classification for Facebook Comments
    Faqeeh, Mosab
    Abdulla, Nawaf
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Quwaider, Muhannad
    [J]. 2014 INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD), 2014, : 573 - 578
  • [9] Multilingual and cross-lingual document classification: A meta-learning approach
    van der Heijden, Niels
    Yannakoudakis, Helen
    Mishra, Pushkar
    Shutova, Ekaterina
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1966 - 1976
  • [10] Cross-lingual Decompositional Semantic Parsing
    Zhang, Sheng
    XutaiMa
    Rudinger, Rachel
    Duh, Kevin
    Van Durme, Benjamin
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1664 - 1675