Cross-Lingual Latent Topic Extraction

被引:0
|
作者
Zhang, Duo [1 ]
Mei, Qiaozhu [2 ]
Zhai, ChengXiang [1 ]
机构
[1] Univ Illinois, Champaign, IL 61801 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages. Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Probabilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary. Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data.
引用
收藏
页码:1128 / 1137
页数:10
相关论文
共 50 条
  • [1] Cross-lingual embeddings with auxiliary topic models
    Zhou, Dong
    Peng, Xiaoya
    Li, Lin
    Han, Jun-mei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190
  • [2] Multi-lingual and Cross-lingual timeline extraction
    Laparra, Egoitz
    Agerri, Rodrigo
    Aldabe, Itziar
    Rigau, German
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 77 - 89
  • [3] Incorporating Word Embedding into Cross-lingual Topic Modeling
    Chang, Chia-Hsuan
    Hwang, San-Yih
    Xui, Tou-Hsiang
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 17 - 24
  • [4] CROSS-LINGUAL TOPIC PREDICTION FOR SPEECH USING TRANSLATIONS
    Bansal, Sameer
    Kamper, Herman
    Lopez, Adam
    Goldwater, Sharon
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8164 - 8168
  • [5] Cross-lingual latent semantic analysis for language modeling
    Kim, W
    Khudanpur, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 257 - 260
  • [6] Reproducible Extraction of Cross-lingual Topics (rectr)
    Chan, Chung-Hong
    Zeng, Jing
    Wessler, Hartmut
    Jungblut, Marc
    Welbers, Kasper
    Bajjalieh, Joseph W.
    van Atteveldt, Wouter
    Althaus, Scott L.
    [J]. COMMUNICATION METHODS AND MEASURES, 2020, 14 (04) : 285 - 305
  • [7] Cross-Lingual Sentence Extraction for Information Distillation
    Singla, Adish Kumar
    Hakkani-Tuer, Dilek
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2707 - 2710
  • [8] Cross-Lingual Information to the Rescue in Keyword Extraction
    Huang, Chung-Chi
    Eskenazi, Maxine
    Carbonell, Jaime
    Ku, Lun-Wei
    Yang, Ping-Che
    [J]. PROCEEDINGS OF 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, 2014, : 1 - 6
  • [9] CLTC: A Chinese-English Cross-lingual Topic Corpus
    Xia, Yunqing
    Tang, Guoyu
    Jin, Peng
    Yang, Xia
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 532 - 537
  • [10] Cross-Lingual Taxonomy Alignment with Bilingual Biterm Topic Model
    Wu, Tianxing
    Qi, Guilin
    Wang, Haofen
    Xu, Kang
    Cui, Xuan
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 287 - 293