Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval

被引:0
|
作者
Feng, Kai [1 ]
Huang, Lan [1 ,3 ]
Xu, Hao [1 ,3 ]
Wang, Kangping [1 ,3 ]
Wei, Wei [2 ]
Zhang, Rui [1 ,3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Changchun Univ Finance & Econ, Sch Int Econ & Trade, Changchun 130012, Peoples R China
[3] Jilin Univ, Minist Educ, Key Lab Symbol Comp & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-lingual document retrieval; cross-lingual features; cross-lingual document representation;
D O I
10.3390/e24070943
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Cross-domain and Cross-lingual Abusive Language Detection: a Hybrid Approach with Deep Learning and a Multilingual Lexicon
    Pamungkas, Endang Wahyu
    Patti, Viviana
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 363 - 370
  • [42] Legal Document Retrieval Using Document Vector Embeddings and Deep Learning
    Sugathadasa, Keet
    Ayesha, Buddhi
    de Silva, Nisansa
    Perera, Amal Shehan
    Jayawardana, Vindula
    Lakmal, Dimuthu
    Perera, Madhavi
    INTELLIGENT COMPUTING, VOL 2, 2019, 857 : 160 - 175
  • [43] Cross-lingual thesaurus for multilingual knowledge management
    Yang, Christopher C.
    Wei, Chih-Ping
    Li, K. W.
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 596 - 605
  • [44] Cross-lingual and multilingual ontology mapping - survey
    Ivanova, Tatyana
    COMPUTER SYSTEMS AND TECHNOLOGIES (COMPSYSTECH'18), 2018, 1641 : 50 - 57
  • [45] Multilingual modeling of cross-lingual spelling variants
    Linden, Krister
    INFORMATION RETRIEVAL, 2006, 9 (03): : 295 - 310
  • [46] Multilingual modeling of cross-lingual spelling variants
    Krister Lindén
    Information Retrieval, 2006, 9 : 295 - 310
  • [47] Multilingual and Cross-Lingual Graded Lexical Entailment
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4963 - 4974
  • [48] Cross-lingual Speaker Verification with Deep Feature Learning
    Li, Lantian
    Wang, Dong
    Rozi, Askar
    Zheng, Thomas Fang
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1040 - 1044
  • [49] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9547 - 9554
  • [50] Evaluation of a Cross-lingual Romanian-English Multi-document Summariser
    Orasan, Constantin
    Chiorean, Oana Andreea
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2114 - 2119