Probabilistic Topic Modeling for Comparative Analysis of Document Collections

被引:18
|
作者
Hua, Ting [1 ,3 ]
Lu, Chang-Tien [1 ,4 ]
Choo, Jaegul [2 ,5 ]
Reddy, Chandan K. [1 ,6 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Korea Univ, Seoul, South Korea
[3] POB 6571, Falls Church, VA 22040 USA
[4] 7054 Haycock Rd,Room 312, Falls Church, VA 22043 USA
[5] Anam Dong 5 Ga, Seoul 136713, South Korea
[6] 900 N Glebe Rd, Arlington, VA 22203 USA
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Probabilistic topic modeling; text mining; CLASSIFICATION; ALGORITHMS; REGRESSION;
D O I
10.1145/3369873
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Probabilistic topic models, which can discover hidden patterns in documents, have been extensively studied. However, rather than learning from a single document collection, numerous real-world applications demand a comprehensive understanding of the relationships among various document sets. To address such needs, this article proposes a new model that can identify the common and discriminative aspects of multiple datasets. Specifically, our proposed method is a Bayesian approach that represents each document as a combination of common topics (shared across all document sets) and distinctive topics (distributions over words that are exclusive to a particular dataset). Through extensive experiments, we demonstrate the effectiveness of our method compared with state-of-the-art models. The proposedmodel can be useful for "comparative thinking" analysis in real-world document collections.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Topic modeling for mediated access to very large document collections
    Muresan, G
    Harper, DJ
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (10): : 892 - 910
  • [2] CONTRAVIS: Contrastive and Visual Topic Modeling for Comparing Document Collections
    Le, Tuan V. M.
    Akoglu, Leman
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 928 - 938
  • [3] Fuzzy clustering for topic analysis and summarization of document collections
    Witte, Rene
    Bergler, Sabine
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2007, 4509 : 476 - +
  • [4] Document Clustering and Topic Modeling: A Unified Bayesian Probabilistic Perspective
    Costa, Gianni
    Ortale, Riccardo
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 278 - 285
  • [5] Topic-based Coordination for Visual Analysis of Evolving Document Collections
    Eler, Danilo Medeiros
    Paulovich, Fernando Vieira
    Ferreira de Oliveira, Maria Cristina
    Minghim, Rosane
    [J]. INFORMATION VISUALIZATION, IV 2009, PROCEEDINGS, 2009, : 149 - 155
  • [6] Evaluating Topic Representations for Exploring Document Collections
    Aletras, Nikolaos
    Baldwin, Timothy
    Lau, Jey Han
    Stevenson, Mark
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2017, 68 (01) : 154 - 167
  • [7] Probabilistic topic modeling for the analysis and classification of genomic sequences
    Massimo La Rosa
    Antonino Fiannaca
    Riccardo Rizzo
    Alfonso Urso
    [J]. BMC Bioinformatics, 16
  • [8] Probabilistic topic modeling for the analysis and classification of genomic sequences
    La Rosa, Massimo
    Fiannaca, Antonino
    Rizzo, Riccardo
    Urso, Alfonso
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [9] A Probabilistic model for compact document topic representation
    Berenyi, Zsolt
    Vajk, Istvan
    [J]. PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND OPTIMIZATION, 2009, : 322 - +
  • [10] Topic Exploration in Spatio-Temporal Document Collections
    Zhao, Kaiqi
    Chen, Lisi
    Cong, Gao
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 985 - 998