Statistical word sense aware topic models

被引:5
|
作者
Tang, Guoyu [1 ]
Xia, Yunqing [1 ]
Sun, Jun [2 ]
Zhang, Min [3 ]
Zheng, Thomas Fang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, TNList, Beijing 100084, Peoples R China
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Soochow Univ, Suzhou, Peoples R China
关键词
Topic modeling; Word sense induction; Document representation; Document clustering;
D O I
10.1007/s00500-014-1372-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LDA has been proved effective in modeling the semantic relation between surface words. This semantic information in the document collection is useful to measure the topic distribution for a document. In general, a surface word may significantly contribute to several topics in a document collection. LDA measures the contribution of a surface word to each topic and considers a surface word to be identical across all documents. However, a surface word may present different signatures in different contexts, i.e., polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the baselines significantly in document clustering and improves the word sense induction as well against a stand-alone non-parametric model.
引用
收藏
页码:13 / 27
页数:15
相关论文
共 50 条
  • [21] Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models
    Neale, Steven
    Gomes, Luis
    Agirre, Eneko
    Lopez de Lacalle, Oier
    Branco, Antonio
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2777 - 2783
  • [22] Sense-Based Topic Word Embedding Model for Item Recommendation
    Xiao, Ya
    Fan, Zhijie
    Tan, Chengxiang
    Xu, Qian
    Zhu, Wenye
    Cheng, Fujia
    [J]. IEEE ACCESS, 2019, 7 : 44748 - 44760
  • [23] A statistical model for parsing and word-sense disambiguation
    Bikel, DM
    [J]. PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 155 - 163
  • [24] A new decision rule for statistical word sense disambiguation
    Fan, Dongmei
    Lu, Zhimao
    Zhang, Rubo
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2008, 5226 : 389 - +
  • [25] A Context-Aware Topic Model for Statistical Machine Translation
    Su, Jinsong
    Xiong, Deyi
    Liu, Yang
    Han, Xianpei
    Lin, Hongyu
    Yao, Junfeng
    Zhang, Min
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 229 - 238
  • [26] A Novel Measure for Coherence in Statistical Topic Models
    Morstatter, Fred
    Liu, Huan
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 543 - 548
  • [27] Statistical debugging using latent topic models
    Andrzejewski, David
    Mulhern, Anne
    Liblit, Ben
    Zhu, Xiaojin
    [J]. MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 6 - +
  • [28] Topic-aware social influence propagation models
    Nicola Barbieri
    Francesco Bonchi
    Giuseppe Manco
    [J]. Knowledge and Information Systems, 2013, 37 : 555 - 584
  • [29] Generation of Word Clouds Using Document Topic Models
    Sendhilkumar, S.
    Srivani, M.
    Mahalakshmi, G. S.
    [J]. 2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 306 - 308
  • [30] Incorporating Local Word Relationships into Probabilistic Topic Models
    Rahimi, Marziea
    Zahedi, Morteza
    Mashayekhi, Hoda
    [J]. 2015 7th Conference on Information and Knowledge Technology (IKT), 2015,