A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval

被引:4
|
作者
Ghanbari, Elham [1 ,2 ]
Shakery, Azadeh [1 ,3 ]
机构
[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, Tehran, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Yadegar E Imam Khomeini RAH Shahre Rey Branch, Tehran, Iran
[3] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
关键词
Learning to rank (LTR); Cross-Lingual information retrieval (CLIR); Cross-lingual features;
D O I
10.1007/s10489-021-02592-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to Rank (LTR) techniques use machine learning to rank documents. In this paper, we propose a new LTR based framework for cross-language information retrieval (CLIR). The core idea of the proposed framework is the use of the knowledge of training queries in the target language as well as the training queries in the source language to extract features and to construct the ranking model instead of using only the training queries in the source language. The proposed framework is composed of two main components. The first component extracts monolingual and cross-lingual features from the queries and the documents. To extract the cross-lingual features, we introduce a general approach based on translation probabilities where translation knowledge, which is created from a combination of probabilistic dictionary extracted from translation resources with the translation knowledge available in the queries in the target language, is used to fill the gap between the documents and the queries. The second component of the proposed framework trains a ranking model to optimize the proposed loss function for an input LTR algorithm, and the features. The new loss function is proposed for any listwise LTR algorithm to construct a ranking model for CLIR. To this end, the loss function of the LTR algorithm is calculated for both training data in the target language and training data in the source language. We propose a linear interpolation of the harmonic mean of two loss functions (monolingual and cross-lingual) and the ratio of these two loss functions as the new loss function. The output of this framework is a cross-lingual ranking model that is created with the goal of minimizing the proposed loss function. Experimental results show that the proposed framework outperforms the baseline information retrieval methods and other LTR ranking models in terms of Mean Average Precision (MAP). The findings also indicate that the use of cross-lingual features considerably increases the efficiency of the framework in terms of MAP and Normalized Discounted Cumulative Gain (NDCG).
引用
收藏
页码:3156 / 3174
页数:19
相关论文
共 50 条
  • [21] Merging Strategy for Cross-Lingual Information Retrieval Systems based on Learning Vector Quantization
    M. T. Martín-Valdivia
    F. Martínez-Santiago
    L. A. Ureña-López
    Neural Processing Letters, 2005, 22 : 149 - 161
  • [22] Steering Large Language Models for Cross-lingual Information Retrieval
    Guo, Ping
    Ren, Yubing
    Hu, Yue
    Cao, Yanan
    Li, Yunpeng
    Huang, Heyan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 585 - 596
  • [23] Cross-Lingual Learning with Distributed Representations
    Pikuliak, Matus
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8032 - 8033
  • [24] Supporting Arabic Cross-Lingual Retrieval Using Contextual Information
    Ahmed, Farag
    Nuernberger, Andreas
    Nitsche, Marcus
    MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2011, 6653 : 30 - 45
  • [25] Cross-lingual information retrieval using hidden Markov models
    Xu, JX
    Weischedel, R
    PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 95 - 103
  • [26] Cross-lingual Information Retrieval: application and Challenges for Indian Languages
    Patel, Jay
    Makvana, Kamlesh
    Shah, Parth
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [27] On cross-lingual retrieval with multilingual text encoders
    Litschko, Robert
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    INFORMATION RETRIEVAL JOURNAL, 2022, 25 (02): : 149 - 183
  • [28] Query by Example for Cross-Lingual Event Retrieval
    Sarwar, Sheikh Muhammad
    Allan, James
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1601 - 1604
  • [29] MuSeCLIR: A Multiple Senses and Cross-lingual Information Retrieval dataset
    Li, Wing Yan
    Weeds, Julie
    Weir, David
    Proceedings - International Conference on Computational Linguistics, COLING, 2022, 29 (01): : 1128 - 1135
  • [30] English-Malayalam Cross-Lingual Information Retrieval - an experience
    Nikesh, P. L.
    Sumam, Mary Idicula
    David, Peter S.
    2008 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY, 2008, : 271 - 275