Statistical word sense aware topic models

被引:5
|
作者
Tang, Guoyu [1 ]
Xia, Yunqing [1 ]
Sun, Jun [2 ]
Zhang, Min [3 ]
Zheng, Thomas Fang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, TNList, Beijing 100084, Peoples R China
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Soochow Univ, Suzhou, Peoples R China
关键词
Topic modeling; Word sense induction; Document representation; Document clustering;
D O I
10.1007/s00500-014-1372-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LDA has been proved effective in modeling the semantic relation between surface words. This semantic information in the document collection is useful to measure the topic distribution for a document. In general, a surface word may significantly contribute to several topics in a document collection. LDA measures the contribution of a surface word to each topic and considers a surface word to be identical across all documents. However, a surface word may present different signatures in different contexts, i.e., polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the baselines significantly in document clustering and improves the word sense induction as well against a stand-alone non-parametric model.
引用
收藏
页码:13 / 27
页数:15
相关论文
共 50 条
  • [31] Time-Aware User Identification with Topic Models
    Lesaege, Clement
    Schnitzler, Francois
    Lambert, Anne
    Vigouroux, Jean-Ronan
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 997 - 1002
  • [32] Topic-aware Social Influence Propagation Models
    Barbieri, Nicola
    Bonchi, Francesco
    Manco, Giuseppe
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 81 - 90
  • [33] Topic-aware social influence propagation models
    Barbieri, Nicola
    Bonchi, Francesco
    Manco, Giuseppe
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 37 (03) : 555 - 584
  • [34] Word Sense Disambiguation of Bangla Sentences Using Statistical Approach
    Nazah, Saiba
    Hoque, Mohammed Moshiul
    Hossain, Md. Rajib
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL INFORMATION AND COMMUNICATION TECHNOLOGY (EICT 2017), 2017,
  • [35] An unsupervised & statistical word sense tagging using bilingual sources
    Oliveira, F
    Wong, F
    Li, YP
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3749 - 3754
  • [36] Topic-aware pivot language approach for statistical machine translation
    Jinsong SU
    Xiaodong SHI
    Yanzhou HUANG
    Yang LIU
    Qingqiang WU
    Yidong CHEN
    Huailin DONG
    [J]. Journal of Zhejiang University-Science C(Computers & Electronics)., 2014, 15 (04) - 253
  • [37] Topic-aware pivot language approach for statistical machine translation
    Jin-song SU
    Xiao-dong SHI
    Yan-zhou HUANG
    Yang LIU
    Qing-qiang WU
    Yi-dong CHEN
    Huai-lin DONG
    [J]. Frontiers of Information Technology & Electronic Engineering, 2014, (04) : 241 - 253
  • [38] Statistical models for topic identification using phoneme substrings
    Wright, JH
    Carey, MJ
    Parris, ES
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 307 - 310
  • [39] Subject Metadata Enrichment using Statistical Topic Models
    Newman, David
    Hagedorn, Kat
    Chemudugunta, Chaitanya
    Smyth, Padhraic
    [J]. PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, : 366 - +
  • [40] Neural Network Models for Word Sense Disambiguation: An Overview
    Popov, Alexander
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (01) : 139 - 151