Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

被引:7
|
作者
Tang, Yi-Kun [1 ,2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Engn Res Ctr High Volume Language Informa, Beijing 100081, Peoples R China
[2] Minjiang Univ, Fujian Prov Key Lab Informat Proc & Intelligent C, Fuzhou 350121, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Topic model; Labeled Phrase LDA; Batch Labeled Phrase LDA; Online Labeled Phrase LDA; TOPIC MODELS;
D O I
10.1007/s10618-018-0555-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a mass of user-marked text data on the Internet, such as web pages with categories, papers with corresponding keywords, and tweets with hashtags. In recent years, supervised topic models, such as Labeled Latent Dirichlet Allocation, have been widely used to discover the abstract topics in labeled text corpora. However, none of these topic models have taken into consideration word order under the bag-of-words assumption, which will obviously lose a lot of semantic information. In this paper, in order to synchronously model semantical label information and word order, we propose a novel topic model, called Labeled Phrase Latent Dirichlet Allocation (LPLDA), which regards each document as a mixture of phrases and partly considers the word order. In order to obtain the parameter estimation for the proposed LPLDA model, we develop a batch inference algorithm based on Gibbs sampling technique. Moreover, to accelerate the LPLDA's processing speed for large-scale stream data, we further propose an online inference algorithm for LPLDA. Extensive experiments were conducted among LPLDA and four state-of-the-art baselines. The results show (1) batch LPLDA significantly outperforms baselines in terms of case study, perplexity and scalability, and the third party task in most cases; (2) the online algorithm for LPLDA is obviously more efficient than batch method under the premise of good results.
引用
收藏
页码:885 / 912
页数:28
相关论文
共 50 条
  • [21] Analysis of Online Suicide Risk with Document Embeddings and Latent Dirichlet Allocation
    Jones, Noah
    Jaques, Natasha
    Pataranutaporn, Pat
    Ghandeharioun, Asma
    Picard, Rosalind
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 264 - 268
  • [22] Matching Reviews to Database Objects Based on Labeled Latent Dirichlet Allocation Model
    Zhu, Yumin
    Li, Qingzhong
    Zhu, Yumin
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 48 - +
  • [23] Terminological ontology learning and population using latent Dirichlet allocation
    Colace, Francesco
    De Santo, Massimo
    Greco, Luca
    Amato, Flora
    Moscato, Vincenzo
    Picariello, Antonio
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2014, 25 (06): : 818 - 826
  • [24] Sequential latent Dirichlet allocation
    Du, Lan
    Buntine, Wray
    Jin, Huidong
    Chen, Changyou
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (03) : 475 - 503
  • [25] Collective Latent Dirichlet Allocation
    Shen, Zhi-Yong
    Sun, Jun
    Shen, Yi-Dong
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 1019 - 1024
  • [26] The Security of Latent Dirichlet Allocation
    Mei, Shike
    Zhu, Xiaojin
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 681 - 689
  • [27] Sequential latent Dirichlet allocation
    Lan Du
    Wray Buntine
    Huidong Jin
    Changyou Chen
    Knowledge and Information Systems, 2012, 31 : 475 - 503
  • [28] Deciphering published articles on cyberterrorism: a latent Dirichlet allocation algorithm application
    Balios Caluza, Las Johansen
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2019, 11 (01) : 87 - 101
  • [29] LDAPrototype: a model selection algorithm to improve reliability of latent Dirichlet allocation
    Rieger, Jonas
    Jentsch, Carsten
    Rahnenfuehrer, Jorg
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [30] WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation
    Chen, Jianfei
    Li, Kaiwei
    Zhu, Jun
    Chen, Wenguang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (10): : 744 - 755