Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval

被引:0
|
作者
Feng, Kai [1 ]
Huang, Lan [1 ,3 ]
Xu, Hao [1 ,3 ]
Wang, Kangping [1 ,3 ]
Wei, Wei [2 ]
Zhang, Rui [1 ,3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Changchun Univ Finance & Econ, Sch Int Econ & Trade, Changchun 130012, Peoples R China
[3] Jilin Univ, Minist Educ, Key Lab Symbol Comp & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-lingual document retrieval; cross-lingual features; cross-lingual document representation;
D O I
10.3390/e24070943
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Multilingual and cross-lingual document classification: A meta-learning approach
    van der Heijden, Niels
    Yannakoudakis, Helen
    Mishra, Pushkar
    Shutova, Ekaterina
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1966 - 1976
  • [2] Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval
    Novak, Erik
    Bizjak, Luka
    Mladenic, Dunja
    Grobelnik, Marko
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [3] Cross-Lingual Document Similarity
    Muhic, Andrej
    Rupnik, Jan
    Skraba, Primoz
    PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 387 - 392
  • [4] Cross-lingual document clustering
    Wu, Ke
    Lu, Bao-Liang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 956 - +
  • [5] Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance
    Balikas, Georgios
    Laclau, Charlotte
    Redko, Ievgen
    Amini, Massih-Reza
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 398 - 410
  • [6] Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
    Zhang, Rui
    Westerfield, Caitlin
    Shim, Sungrok
    Bingham, Garrett
    Fabbri, Alexander
    Hu, William
    Verma, Neha
    Radev, Dragomir
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3173 - 3179
  • [7] Morpheme-based, cross-lingual indexing for medical document retrieval
    Schulz, S
    Hahn, U
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 87 - 99
  • [8] On cross-lingual retrieval with multilingual text encoders
    Litschko, Robert
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    INFORMATION RETRIEVAL JOURNAL, 2022, 25 (02): : 149 - 183
  • [9] On cross-lingual retrieval with multilingual text encoders
    Robert Litschko
    Ivan Vulić
    Simone Paolo Ponzetto
    Goran Glavaš
    Information Retrieval Journal, 2022, 25 : 149 - 183
  • [10] Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
    Zhou, Xinjie
    Wan, Xianjun
    Xiao, Jianguo
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1403 - 1412