A Guided Topic-Noise Model for Short Texts

被引:1
|
作者
Churchill, Robert [1 ]
Singh, Lisa [1 ]
Ryan, Rebecca [1 ]
Davis-Kean, Pamela [2 ]
机构
[1] Georgetown Univ, Washington, DC 20007 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
topic modeling; social media; semi-supervised topic model; guided topic model; topic-noise model; seed topics;
D O I
10.1145/3485447.3512007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Researchers using social media data want to understand the discussions occurring in and about their respective fields. These domain experts often turn to topic models to help them see the entire landscape of the conversation, but unsupervised topic models often produce topic sets that miss topics experts expect or want to see. To solve this problem, we propose Guided Topic-Noise Model (GTM), a semi-supervised topic model designed with large domain-specific social media data sets in mind. The input to GTM is a set of topics that are of interest to the user and a small number of words or phrases that belong to those topics. These seed topics are used to guide the topic generation process, and can be augmented interactively, expanding the seed word list as the model provides new relevant words for different topics. GTM uses a novel initialization and a new sampling algorithm called Generalized Polya Urn (GPU) seed word sampling to produce a topic set that includes expanded seed topics, as well as new unsupervised topics. We demonstrate the robustness of GTM on open-ended responses from a public opinion survey and four domain-specific Twitter data sets
引用
收藏
页码:2870 / 2878
页数:9
相关论文
共 50 条
  • [1] Dynamic Topic-Noise Models for Social Media
    Churchill, Rob
    Singh, Lisa
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 429 - 443
  • [2] Topic-Noise Models: Modeling Topic and Noise Distributions in Social Media Post Collections
    Churchill, Rob
    Singh, Lisa
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 71 - 80
  • [3] Copula Guided Neural Topic Modelling for Short Texts
    Lin, Lihui
    Jiang, Hongyu
    Rao, Yanghui
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1773 - 1776
  • [4] Sparse Biterm Topic Model for Short Texts
    Zhu, Bingshan
    Cai, Yi
    Zhang, Huakui
    [J]. WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 227 - 241
  • [5] Enhanced Contextual Neural Topic Model for Short Texts
    Liu, Gang
    Wang, Tongli
    Tang, Hongwei
    Zhan, Kai
    Yang, Wenli
    [J]. Computer Engineering and Applications, 2024, 60 (01) : 154 - 164
  • [6] Topic segmentation for short texts
    Chang, TH
    Lee, CH
    [J]. PACLIC 17: LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2003, : 159 - 165
  • [7] A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts
    Maithya, Emmanuel Muthoka
    Nderu, Lawrence
    Njagi, Dennis
    [J]. 2022 IST-AFRICA CONFERENCE, 2022,
  • [8] Robust Word-Network Topic Model for Short Texts
    Wang, Fei
    Liu, Rui
    Zuo, Yuan
    Zhang, Hui
    Zhang, He
    Wu, Junjie
    [J]. 2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 852 - 856
  • [9] A New Sentiment and Topic Model for Short Texts on Social Media
    Xu, Kang
    Huang, Junheng
    Qi, Guilin
    [J]. SEMANTIC TECHNOLOGY, JIST 2017, 2017, 10675 : 183 - 198
  • [10] Attention-based Autoencoder Topic Model for Short Texts
    Tian, Tian
    Fang, Zheng
    [J]. 10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 1134 - 1139