Context-Aware Latent Dirichlet Allocation for Topic Segmentation

被引:6
|
作者
Li, Wenbo [1 ,2 ]
Matsukawa, Tetsu [1 ,2 ]
Saigo, Hiroto [1 ,2 ]
Suzuki, Einoshin [1 ,2 ,3 ]
机构
[1] Kyushu Univ, Grad Sch, Fukuoka, Japan
[2] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka, Japan
[3] Kyushu Univ, Grad Sch Syst Life Sci, Fukuoka, Japan
基金
日本学术振兴会;
关键词
DISCOVERY; MODEL;
D O I
10.1007/978-3-030-47426-3_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most of the existing models either fuse topic segments by keywords or focus on modeling word co-occurrence patterns without merging. They can hardly achieve both coherency and saliency since many words have high uncertainties in topic assignments due to their polysemous nature. To solve this problem, we introduce topic-specific co-occurrence of word pairs within contexts in modeling, to generate more coherent segments and alleviate the influence of irrelevant words on topic assignment. We also design an optimization algorithm to eliminate redundant items in the generated topic segments. Experimental results show that our proposal produces significant improvements in both topic coherence and topic segmentation.
引用
收藏
页码:475 / 486
页数:12
相关论文
共 50 条
  • [31] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [32] iLDA: An interactive latent Dirichlet allocation model to improve topic quality
    Liu, Yezheng
    Du, Fei
    Sun, Jianshan
    Jiang, Yuanchun
    JOURNAL OF INFORMATION SCIENCE, 2020, 46 (01) : 23 - 40
  • [33] Topic Modeling of Online Accommodation Reviews via Latent Dirichlet Allocation
    Sutherland, Ian
    Sim, Youngseok
    Lee, Seul Ki
    Byun, Jaemun
    Kiatkawsin, Kiattipoom
    SUSTAINABILITY, 2020, 12 (05) : 1 - 15
  • [34] Road Traffic Topic Modeling on Twitter using Latent Dirichlet Allocation
    Hidayatullah, Ahmad Fathan
    Ma'arif, Muhammad Rifqi
    2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 47 - 52
  • [35] An exploration of research trends on metaverse: topic modeling with latent dirichlet allocation
    Park H.
    Ahn B.
    Kim T.
    Quality & Quantity, 2025, 59 (1) : 233 - 252
  • [36] Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation
    Jeon, Hyung-Bae
    Lee, Soo-Young
    ETRI JOURNAL, 2016, 38 (03) : 487 - 493
  • [37] Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation
    Bolelli, Levent
    Ertekin, Seyda
    Giles, C. Lee
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 776 - +
  • [38] A FRAMEWORK OF URDU TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)
    Shakeel, Khadija
    Tahir, Ghulam Rasool
    Tehseen, Irsha
    Ali, Mubashir
    2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 117 - 123
  • [39] ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation
    Schwarz, Carlo
    STATA JOURNAL, 2018, 18 (01): : 101 - 117
  • [40] Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints
    Bastani, Kaveh
    Namavari, Hamed
    Shaffer, Jeffrey
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 : 256 - 271