Statistical word sense aware topic models

被引:5
|
作者
Tang, Guoyu [1 ]
Xia, Yunqing [1 ]
Sun, Jun [2 ]
Zhang, Min [3 ]
Zheng, Thomas Fang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, TNList, Beijing 100084, Peoples R China
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Soochow Univ, Suzhou, Peoples R China
关键词
Topic modeling; Word sense induction; Document representation; Document clustering;
D O I
10.1007/s00500-014-1372-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LDA has been proved effective in modeling the semantic relation between surface words. This semantic information in the document collection is useful to measure the topic distribution for a document. In general, a surface word may significantly contribute to several topics in a document collection. LDA measures the contribution of a surface word to each topic and considers a surface word to be identical across all documents. However, a surface word may present different signatures in different contexts, i.e., polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the baselines significantly in document clustering and improves the word sense induction as well against a stand-alone non-parametric model.
引用
收藏
页码:13 / 27
页数:15
相关论文
共 50 条
  • [1] Statistical word sense aware topic models
    Guoyu Tang
    Yunqing Xia
    Jun Sun
    Min Zhang
    Thomas Fang Zheng
    [J]. Soft Computing, 2015, 19 : 13 - 27
  • [2] Topic Models Incorporating Statistical Word Senses
    Tang, Guoyu
    Xia, Yunqing
    Sun, Jun
    Zhang, Min
    Zheng, Thomas Fang
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PT I, 2014, 8403 : 151 - 162
  • [3] Knowledge-Based Word Sense Disambiguation Using Topic Models
    Chaplot, Devendra Singh
    Salakhutdinov, Ruslan
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5062 - 5069
  • [4] A Word Sense Probabilistic Topic Model
    Jin, Peng
    Chen, Xingyuan
    [J]. 2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 401 - 404
  • [5] Topic Modeling for Word Sense Induction
    Knopp, Johannes
    Voelker, Johanna
    Ponzetto, Simone Paolo
    [J]. LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 97 - 103
  • [6] Topic Models for Word Sense Disambiguation and Token-based Idiom Detection
    Li, Linlin
    Roth, Benjamin
    Sporleder, Caroline
    [J]. ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 1138 - 1147
  • [7] Hybrid Context-Aware Word Sense Disambiguation in Topic Modeling based Document Representation
    Li, Wenbo
    Suzuki, Einoshin
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 332 - 341
  • [8] Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities
    Remus, Steffen
    Biemann, Chris
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1035 - 1041
  • [9] Topic Modeling and Word Sense Disambiguation on the Ancora corpus
    Izquierdo, Ruben
    Postma, Marten
    Vossen, Piek
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2015, (55): : 15 - 22
  • [10] Word Sense Induction using Correlated Topic Model
    Thanh Tung Hoang
    Phuong Thai Nguyen
    [J]. 2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 41 - 44