Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

被引:7
|
作者
Tang, Yi-Kun [1 ,2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Engn Res Ctr High Volume Language Informa, Beijing 100081, Peoples R China
[2] Minjiang Univ, Fujian Prov Key Lab Informat Proc & Intelligent C, Fuzhou 350121, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Topic model; Labeled Phrase LDA; Batch Labeled Phrase LDA; Online Labeled Phrase LDA; TOPIC MODELS;
D O I
10.1007/s10618-018-0555-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a mass of user-marked text data on the Internet, such as web pages with categories, papers with corresponding keywords, and tweets with hashtags. In recent years, supervised topic models, such as Labeled Latent Dirichlet Allocation, have been widely used to discover the abstract topics in labeled text corpora. However, none of these topic models have taken into consideration word order under the bag-of-words assumption, which will obviously lose a lot of semantic information. In this paper, in order to synchronously model semantical label information and word order, we propose a novel topic model, called Labeled Phrase Latent Dirichlet Allocation (LPLDA), which regards each document as a mixture of phrases and partly considers the word order. In order to obtain the parameter estimation for the proposed LPLDA model, we develop a batch inference algorithm based on Gibbs sampling technique. Moreover, to accelerate the LPLDA's processing speed for large-scale stream data, we further propose an online inference algorithm for LPLDA. Extensive experiments were conducted among LPLDA and four state-of-the-art baselines. The results show (1) batch LPLDA significantly outperforms baselines in terms of case study, perplexity and scalability, and the third party task in most cases; (2) the online algorithm for LPLDA is obviously more efficient than batch method under the premise of good results.
引用
收藏
页码:885 / 912
页数:28
相关论文
共 50 条
  • [31] Type-2 Fuzzy Labeled Latent Dirichlet Allocation for Human Action Categorization
    Cao, Xiao-Qin
    Liu, Zhi-Qiang
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1338 - 1341
  • [32] Online Latent Dirichlet Allocation Model Based on Sentiment Polarity Time Series
    HUANG Bo
    JU Jiaji
    CHEN Huan
    ZHU Yimin
    LIU Jin
    SHI Zhicai
    WuhanUniversityJournalofNaturalSciences, 2021, 26 (06) : 464 - 472
  • [33] Topic-Based User Segmentation for Online Advertising with Latent Dirichlet Allocation
    Tu, Songgao
    Lu, Chaojun
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2010), PT II, 2010, 6441 : 259 - 269
  • [34] Learning Context on a Humanoid Robot using Incremental Latent Dirichlet Allocation
    Celikkanat, Hande
    Orhan, Guner
    Pugeault, Nicolas
    Guerin, Frank
    Sahin, Erol
    Kalkan, Sinan
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2016, 8 (01) : 42 - 59
  • [35] Learning and Using Context on a Humanoid Robot Using Latent Dirichlet Allocation
    Celikkanat, Hande
    Orhan, Guner
    Pugeault, Nicolas
    Guerin, Frank
    Sahin, Erol
    Kalkan, Sinan
    FOUTH JOINT IEEE INTERNATIONAL CONFERENCES ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (IEEE ICDL-EPIROB 2014), 2014, : 201 - 207
  • [36] Initializing Deep Learning Based on Latent Dirichlet Allocation for Document Classification
    Jeon, Hyung-Bae
    Lee, Soo-Young
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 634 - 641
  • [37] Identification of Pavement Issues Using Latent Dirichlet Allocation Machine Learning
    Parsons, Timothy A.
    Pullen, Aaron
    AIRFIELD AND HIGHWAY PAVEMENTS 2023: INNOVATION AND SUSTAINABILITY IN AIRFIELD AND HIGHWAY PAVEMENTS TECHNOLOGY, 2023, : 185 - 193
  • [38] Cluster-based architecture for parallel learning of latent dirichlet allocation
    Tu, Xionggang
    Chen, Jun
    Yang, Lu
    Yan, Jianfeng
    Journal of Computational Information Systems, 2015, 11 (02): : 399 - 407
  • [39] Learning a Weather Dictionary of Atmospheric Patterns Using Latent Dirichlet Allocation
    Fery, Lucas
    Dubrulle, Berengere
    Podvin, Berengere
    Pons, Flavio
    Faranda, Davide
    GEOPHYSICAL RESEARCH LETTERS, 2022, 49 (09)
  • [40] A text classification model constructed by Latent Dirichlet Allocation and Deep Learning
    Liu, Yu
    Jin, Zhengping
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2501 - 2504