Knowledge Transfer across Multilingual Corpora via Latent Topics

被引:0
|
作者
De Smet, Wim [1 ]
Tang, Jie [2 ]
Moens, Marie-Francine [1 ]
机构
[1] Katholieke Univ Leuven, Louvain, Belgium
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
Cross-lingual knowledge transfer; Latent topic models; Text categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a unified probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual, dictionary-less text categorization task. Experimental results on multilingual Wikipedia data show that the proposed topic model effectively discovers the topic information from the bilingual corpora, and the learned topics successfully transfer classification knowledge to other languages, for which no labeled training data are available.
引用
收藏
页码:549 / 560
页数:12
相关论文
共 50 条
  • [1] Extracting Multilingual Topics from Unaligned Comparable Corpora
    Jagarlamudi, Jagadeesh
    Daume, Hal, III
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 444 - 456
  • [2] Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity
    Shi, Bei
    Lam, Wai
    Bing, Lidong
    Xu, Yinqing
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2293 - 2298
  • [3] Multilingual Knowledge Graph Completion via Ensemble Knowledge Transfer
    Chen, Xuelu
    Chen, Muhao
    Fan, Changjun
    Uppunda, Ankith
    Sun, Yizhou
    Zaniolo, Carlo
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [4] Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
    Yarowsky, D
    Ngai, G
    [J]. 2ND MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, : 200 - 207
  • [5] SWEAT: Scoring Polarization of Topics across Different Corpora
    Bianchi, Federico
    Marelli, Marco
    Nicoli, Paolo
    Palmonari, Matteo
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10065 - 10072
  • [6] Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora
    Cao, Hancheng
    Cheng, Mengjie
    Cen, Zhepeng
    McFarland, Daniel A.
    Ren, Xiang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1746 - 1757
  • [7] A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability
    Yang, Weiwei
    Boyd-Graber, Jordan
    Resnik, Philip
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1243 - 1248
  • [8] Improving Video Retrieval Using Multilingual Knowledge Transfer
    Madasu, Avinash
    Aflalo, Estelle
    Stan, Gabriela Ben Melech
    Tseng, Shao-Yen
    Bertasius, Gedas
    Lal, Vasudev
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 669 - 684
  • [9] Transfer of the pedagogical transformation competence across chemistry topics
    Mavhunga, Elizabeth
    [J]. CHEMISTRY EDUCATION RESEARCH AND PRACTICE, 2016, 17 (04) : 1081 - 1097
  • [10] On Discovering the Number of Document Topics via Conceptual Latent Space
    Nghia Duong-Trung
    Schmidt-Thieme, Lars
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2051 - 2054