Context-Aware Latent Dirichlet Allocation for Topic Segmentation

被引:6
|
作者
Li, Wenbo [1 ,2 ]
Matsukawa, Tetsu [1 ,2 ]
Saigo, Hiroto [1 ,2 ]
Suzuki, Einoshin [1 ,2 ,3 ]
机构
[1] Kyushu Univ, Grad Sch, Fukuoka, Japan
[2] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka, Japan
[3] Kyushu Univ, Grad Sch Syst Life Sci, Fukuoka, Japan
基金
日本学术振兴会;
关键词
DISCOVERY; MODEL;
D O I
10.1007/978-3-030-47426-3_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most of the existing models either fuse topic segments by keywords or focus on modeling word co-occurrence patterns without merging. They can hardly achieve both coherency and saliency since many words have high uncertainties in topic assignments due to their polysemous nature. To solve this problem, we introduce topic-specific co-occurrence of word pairs within contexts in modeling, to generate more coherent segments and alleviate the influence of irrelevant words on topic assignment. We also design an optimization algorithm to eliminate redundant items in the generated topic segments. Experimental results show that our proposal produces significant improvements in both topic coherence and topic segmentation.
引用
收藏
页码:475 / 486
页数:12
相关论文
共 50 条
  • [1] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [2] Topic-Based User Segmentation for Online Advertising with Latent Dirichlet Allocation
    Tu, Songgao
    Lu, Chaojun
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2010), PT II, 2010, 6441 : 259 - 269
  • [3] Learning Context-aware Latent Representations for Context-aware Collaborative Filtering
    Liu, Xin
    Wu, Wei
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 887 - 890
  • [4] Towards context-aware collaborative filtering by learning context-aware latent representations
    Liu, Xin
    Zhang, Jiyong
    Yan, Chenggang
    KNOWLEDGE-BASED SYSTEMS, 2020, 199
  • [5] Topic Modeling Using Latent Dirichlet allocation: A Survey
    Chauhan, Uttam
    Shah, Apurva
    ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [6] A Hybrid Latent Dirichlet Allocation Approach for Topic Classification
    Hsu, Chi-I
    Chiu, Chaochang
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 312 - 315
  • [7] Semantic latent dirichlet allocation for automatic topic extraction
    Bhutada, Sunil
    Balaram, V. V. S. S. S.
    Bulusu, Vishnu Vardhan
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2016, 37 (03): : 449 - 469
  • [8] Using Latent Dirichlet Allocation for Topic Modelling in Twitter
    Ostrowski, David Alfred
    2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2015, : 493 - 497
  • [9] Topic Model Allocation of Conversational Dialogue Records by Latent Dirichlet Allocation
    Yeh, Jui-Feng
    Lee, Chen-Hsien
    Tan, Yi-Shiuan
    Yu, Liang-Chih
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [10] Hierarchical Latent Context Representation for Context-Aware Recommendations
    Unger, Moshe
    Tuzhilin, Alexander
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (07) : 3322 - 3334