Document-based topic coherence measures for news media text

被引:26
|
作者
Korencic, Damir [1 ,2 ]
Ristov, Strahil [1 ]
Snajder, Jan [2 ]
机构
[1] Rudjer Boskovic Inst, Dept Elect, Bijenicka Cesta 54, Zagreb 10000, Croatia
[2] Univ Zagreb, Fac Elect Engn & Comp, Unska 3, Zagreb 10000, Croatia
关键词
Topic models; Topic coherence; Topic model evaluation; Text analysis; News text; Exploratory analysis; MODEL;
D O I
10.1016/j.eswa.2018.07.063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a rising need for automated analysis of news text, and topic models have proven to be useful tools for this task. However, as the quality of the topics induced by topic models greatly varies, much research effort has been devoted to their automated evaluation. Recent research has focused on topic coherence as a measure of a topic's quality. Existing topic coherence measures work by considering the semantic similarity of topic words. This makes them unfit to detect the coherence of transient topics with semantically unrelated topic words, which abound in news media texts. In this paper, we introduce the notion of document-based topic coherence and propose novel topic coherence measures that estimate topic coherence based on topic documents rather than topic words. We evaluate the proposed measures on two datasets containing topics manually labeled for document-based coherence, on which the proposed measures outperform a strong baseline as well as word-based coherence measures. We also demonstrate the usefulness of document-based coherence measures for automated topic discovery from news media texts. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:357 / 373
页数:17
相关论文
共 50 条
  • [1] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Mingtao Sun
    Xiaowei Zhao
    Jingjing Lin
    Jian Jing
    Deqing Wang
    Guozhu Jia
    [J]. Frontiers of Computer Science, 2022, 16
  • [2] PSLDA:a novel supervised pseudo document-based topic model for short texts
    Mingtao SUN
    Xiaowei ZHAO
    Jingjing LIN
    Jian JING
    Deqing WANG
    Guozhu JIA
    [J]. Frontiers of Computer Science., 2022, 16 (06) - 81
  • [3] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Sun, Mingtao
    Zhao, Xiaowei
    Lin, Jingjing
    Jing, Jian
    Wang, Deqing
    Jia, Guozhu
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (06)
  • [4] Topic identification based on document coherence and spectral analysis
    D'hondt, Joris
    Verhaegen, Paul-Armand
    Vertommen, Joris
    Cattrysse, Dirk
    Duflou, Joost R.
    [J]. INFORMATION SCIENCES, 2011, 181 (18) : 3783 - 3797
  • [5] Document-based decision making
    Wright, P
    [J]. MULTIMEDIA LEARNING: COGNITIVE AND INSTRUCTIONAL ISSUES, 2000, : 31 - 43
  • [6] Document-Based Nuclear Archaeology
    Reistad, Ole
    Glaser, Alex
    Frank, Rebecca D.
    Kaald, Sindre H.
    [J]. SCIENCE & GLOBAL SECURITY, 2022, 30 (02) : 95 - 107
  • [7] News Text Classification Model Based on Topic Model
    Li, Zhenzhong
    Shang, Wenqian
    Yan, Menghan
    [J]. 2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1197 - 1201
  • [8] Text-Based Measures of Document Diversity
    Bache, Kevin
    Newman, David
    Smyth, Padhraic
    [J]. 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 23 - 31
  • [9] OnTheFly: a tool for automated document-based text annotation, data linking and network generation
    Pavlopoulos, Georgios A.
    Pafilis, Evangelos
    Kuhn, M.
    Hooper, Sean D.
    Schneider, Reinhard
    [J]. BIOINFORMATICS, 2009, 25 (07) : 977 - 978
  • [10] News text continuation based on Topic Similarity Evaluation Module
    Wu, Xia
    Deng, Hua
    Bao, Lina
    Cui, Haoyi
    Liu, Wei
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 228 - 233