Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval

被引:0
|
作者
Feng, Kai [1 ]
Huang, Lan [1 ,3 ]
Xu, Hao [1 ,3 ]
Wang, Kangping [1 ,3 ]
Wei, Wei [2 ]
Zhang, Rui [1 ,3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Changchun Univ Finance & Econ, Sch Int Econ & Trade, Changchun 130012, Peoples R China
[3] Jilin Univ, Minist Educ, Key Lab Symbol Comp & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-lingual document retrieval; cross-lingual features; cross-lingual document representation;
D O I
10.3390/e24070943
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Cross-Lingual Training of Neural Models for Document Ranking
    Shi, Peng
    Bai, He
    Lin, Jimmy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2768 - 2773
  • [22] Cross-Lingual Sentiment Analysis in Deep Learning: A Comparative Study of Multilingual Approaches
    Kumar, Rishabh
    Kumar, Rajat
    Singh, Ritik
    Katarya, Rahul
    2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023, 2023,
  • [23] Cross-lingual and Multilingual CLIP
    Carlsson, Fredrik
    Eisen, Philipp
    Rekathati, Faton
    Sahlgren, Magnus
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6848 - 6854
  • [24] Cross-Lingual Text Classification with Model Translation and Document Translation
    Moh, Teng-Sheng
    Zhang, Zhang
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [25] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [26] Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (02)
  • [27] A multilingual text mining approach to web cross-lingual text retrieval
    Chau, RW
    Yeh, CH
    KNOWLEDGE-BASED SYSTEMS, 2004, 17 (5-6) : 219 - 227
  • [28] Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT
    Wang, Runchuan
    Zhang, Zhao
    Zhuang, Fuzhen
    Gao, Dehong
    Wei, Yi
    He, Qing
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3498 - 3502
  • [29] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174
  • [30] Cross-Lingual Validation of Multilingual Wordnets
    Tufis, Dan
    Ion, Radu
    Barbu, Eduard
    Barbu, Verginica
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 332 - 340