Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

被引:6
|
作者
Wang, Runchuan [1 ,2 ]
Zhang, Zhao [3 ,7 ]
Zhuang, Fuzhen [4 ,5 ]
Gao, Dehong [6 ]
Wei, Yi [6 ]
He, Qing [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc Chinese Acad S, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[4] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[5] Beihang Univ, SKLSDE, Sch Comp Sci, Beijing 100191, Peoples R China
[6] Alibaba Grp, Hangzhou, Peoples R China
[7] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual Information Retrieval; BERT; Adversarial Networks; Domain Adaptation;
D O I
10.1145/3459637.3482050
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based language models (e.g. BERT, RoBERT, GPT, etc) have shown remarkable performance in many natural language processing tasks and their multilingual variants make it easier to handle cross-lingual tasks without using machine translation system. In this paper, we apply multilingual BERT in cross-lingual information retrieval (CLIR) task with triplet loss to learn the relevance between queries and documents written in different languages. Moreover, we align the token embeddings from different languages via adversarial networks to help the language model to learn cross-lingual sentence representation. We achieve the state-of-the-art result on the newly published CLIR dataset: CLIRMatrix. Furthermore, we show that the adversarial multilingual BERT can also get the competitive result in the zero-shot setting in some specific languages when we are lack of CLIR training data in a specific language.
引用
收藏
页码:3498 / 3502
页数:5
相关论文
共 50 条
  • [1] Exploiting Wikipedia for cross-lingual and multilingual information retrieval
    Sorg, P.
    Cimiano, P.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2012, 74 : 26 - 45
  • [2] Cross-Lingual Adversarial Domain Adaptation for Novice Programming
    Mao, Ye
    Khoshnevisan, Farzaneh
    Price, Thomas
    Barnes, Tiffany
    Chi, Min
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7682 - 7690
  • [3] On cross-lingual retrieval with multilingual text encoders
    Litschko, Robert
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    [J]. INFORMATION RETRIEVAL JOURNAL, 2022, 25 (02): : 149 - 183
  • [4] On cross-lingual retrieval with multilingual text encoders
    Robert Litschko
    Ivan Vulić
    Simone Paolo Ponzetto
    Goran Glavaš
    [J]. Information Retrieval Journal, 2022, 25 : 149 - 183
  • [5] Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study
    Kulshreshtha, Saurabh
    Redondo-Garcia, Jose Luis
    Chang, Ching-Yun
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 933 - 942
  • [6] Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
    Latif, Siddique
    Qadir, Junaid
    Bilal, Muhammad
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [7] Syntax-augmented Multilingual BERT for Cross-lingual Transfer
    Ahmad, Wasi Uddin
    Li, Haoran
    Chang, Kai-Wei
    Mehdad, Yashar
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4538 - 4554
  • [8] Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain
    Uresova, Zdenka
    Dusek, Ondrej
    Hajic, Jan
    Pecina, Pavel
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3244 - 3247
  • [9] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    [J]. 23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +
  • [10] When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer
    Deshpande, Ameet
    Talukdar, Partha
    Narasimhan, Karthik
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3610 - 3623