Semantic Space Transformations for Cross-Lingual Document Classification

被引:1
|
作者
Martinek, Jiri [1 ]
Lenc, Ladislav [2 ]
Kral, Pavel [1 ,2 ]
机构
[1] Univ West Bohemia, Fac Sci Appl, Dept Comp Sci & Engn, Plzen, Czech Republic
[2] Univ West Bohemia, Fac Sci Appl, NTIS, Plzen, Czech Republic
关键词
D O I
10.1007/978-3-030-01418-6_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual document representation can be done by training monolingual semantic spaces and then to use bilingual dictionaries with some transform method to project word vectors into a unified space. The main goal of this paper consists in evaluation of three promising transform methods on cross-lingual document classification task. We also propose, evaluate and compare two cross-lingual document classification approaches. We use popular convolutional neural network (CNN) and compare its performance with a standard maximum entropy classifier. The proposed methods are evaluated on four languages, namely English, German, Spanish and Italian from the Reuters corpus. We demonstrate that the results of all transformation methods are close to each other, however the orthogonal transformation gives generally slightly better results when CNN with trained embeddings is used. The experimental results also show that convolutional network achieves better results than maximum entropy classifier. We further show that the proposed methods are competitive with the state of the art.
引用
收藏
页码:608 / 616
页数:9
相关论文
共 50 条
  • [41] Evaluating Cross-lingual Semantic Annotation for Medical Forms
    Lin, Ying-Chi
    Christen, Victor
    Gross, Anika
    Kirsten, Toralf
    Cardoso, Silvio Domingos
    Pruski, Cedric
    Da Silveira, Marcos
    Rahm, Erhard
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 5: HEALTHINF, 2020, : 145 - 155
  • [42] Cross-Lingual Semantic Similarity Measure for Comparable Articles
    Saad, Motaz
    Langlois, David
    Smaili, Kamel
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2014, 8686 : 105 - +
  • [43] A cross-lingual secure semantic searching scheme with semantic analysis on ciphertext
    Yang, Wenyuan
    Sun, Boyu
    Ma, Xuewei
    Zhu, Yuesheng
    [J]. ELECTRONICS LETTERS, 2022, 58 (03) : 103 - 105
  • [44] Multilingual seq2seq training with similarity loss for cross-lingual document classification
    Yu, Katherine
    Li, Haoran
    Oguz, Barlas
    [J]. REPRESENTATION LEARNING FOR NLP, 2018, : 175 - 179
  • [45] Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach
    Huang, Hsun-Hui
    Kuo, Yau-Hwang
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (06) : 1098 - 1111
  • [46] Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
    Moritz, Maria
    Steding, David
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1976 - 1980
  • [47] Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval
    Novak, Erik
    Bizjak, Luka
    Mladenic, Dunja
    Grobelnik, Marko
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [48] Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance
    Balikas, Georgios
    Laclau, Charlotte
    Redko, Ievgen
    Amini, Massih-Reza
    [J]. ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 398 - 410
  • [49] Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text
    Rettinger, Achim
    Schumilin, Artem
    Thoma, Steffen
    Ell, Basil
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, ESWC 2015, 2015, 9088 : 337 - 352
  • [50] Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
    Sherborne, Tom
    Hosking, Tom
    Lapata, Mirella
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1432 - 1450