Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance

被引:2
|
作者
Balikas, Georgios [1 ]
Laclau, Charlotte [1 ]
Redko, Ievgen [2 ]
Amini, Massih-Reza [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
[2] Univ Lyon, Univ Claude Bernard Lyon 1, INSA Lyon,F69XXX, UJM St Etienne,CNRS,Inserm,CREATIS UMR 5220,U1206, Lyon, France
关键词
D O I
10.1007/978-3-319-76941-7_30
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Mover's Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both extensions in the task of cross-lingual document retrieval (CLDR). Our experimental results on eight CLDR problems suggest that the proposed methods achieve remarkable improvements in terms of Mean Reciprocal Rank compared to several baselines.
引用
收藏
页码:398 / 410
页数:13
相关论文
共 50 条
  • [1] Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
    Feng, Kai
    Huang, Lan
    Xu, Hao
    Wang, Kangping
    Wei, Wei
    Zhang, Rui
    ENTROPY, 2022, 24 (07)
  • [2] Adversarial training with Wasserstein distance for learning cross-lingual word embeddings
    Li, Yuling
    Zhang, Yuhong
    Yu, Kui
    Hu, Xuegang
    APPLIED INTELLIGENCE, 2021, 51 (11) : 7666 - 7678
  • [3] Adversarial training with Wasserstein distance for learning cross-lingual word embeddings
    Yuling Li
    Yuhong Zhang
    Kui Yu
    Xuegang Hu
    Applied Intelligence, 2021, 51 : 7666 - 7678
  • [4] Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval
    Novak, Erik
    Bizjak, Luka
    Mladenic, Dunja
    Grobelnik, Marko
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [5] Cross-Lingual Document Similarity
    Muhic, Andrej
    Rupnik, Jan
    Skraba, Primoz
    PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 387 - 392
  • [6] Cross-lingual document clustering
    Wu, Ke
    Lu, Bao-Liang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 956 - +
  • [7] Cross-Lingual Phrase Retrieval
    Zheng, Heqi
    Zhang, Xiao
    Chi, Zewen
    Huang, Heyan
    Yan, Tan
    Lan, Tian
    Wei, Wei
    Mao, Xian-Ling
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4193 - 4204
  • [8] Morpheme-based, cross-lingual indexing for medical document retrieval
    Schulz, S
    Hahn, U
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 87 - 99
  • [9] WASSERSTEIN CROSS-LINGUAL ALIGNMENT FOR NAMED ENTITY RECOGNITION
    Wang, Rui
    Henao, Ricardo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8342 - 8346
  • [10] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +