Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

被引:7
|
作者
Tang, Yi-Kun [1 ,2 ]
Mao, Xian-Ling [1 ]
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Engn Res Ctr High Volume Language Informa, Beijing 100081, Peoples R China
[2] Minjiang Univ, Fujian Prov Key Lab Informat Proc & Intelligent C, Fuzhou 350121, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Topic model; Labeled Phrase LDA; Batch Labeled Phrase LDA; Online Labeled Phrase LDA; TOPIC MODELS;
D O I
10.1007/s10618-018-0555-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a mass of user-marked text data on the Internet, such as web pages with categories, papers with corresponding keywords, and tweets with hashtags. In recent years, supervised topic models, such as Labeled Latent Dirichlet Allocation, have been widely used to discover the abstract topics in labeled text corpora. However, none of these topic models have taken into consideration word order under the bag-of-words assumption, which will obviously lose a lot of semantic information. In this paper, in order to synchronously model semantical label information and word order, we propose a novel topic model, called Labeled Phrase Latent Dirichlet Allocation (LPLDA), which regards each document as a mixture of phrases and partly considers the word order. In order to obtain the parameter estimation for the proposed LPLDA model, we develop a batch inference algorithm based on Gibbs sampling technique. Moreover, to accelerate the LPLDA's processing speed for large-scale stream data, we further propose an online inference algorithm for LPLDA. Extensive experiments were conducted among LPLDA and four state-of-the-art baselines. The results show (1) batch LPLDA significantly outperforms baselines in terms of case study, perplexity and scalability, and the third party task in most cases; (2) the online algorithm for LPLDA is obviously more efficient than batch method under the premise of good results.
引用
收藏
页码:885 / 912
页数:28
相关论文
共 50 条
  • [1] Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm
    Yi-Kun Tang
    Xian-Ling Mao
    Heyan Huang
    Data Mining and Knowledge Discovery, 2018, 32 : 885 - 912
  • [2] Labeled Phrase Latent Dirichlet Allocation
    Tang, Yi-Kun
    Mao, Xian-Ling
    Huang, Heyan
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT I, 2016, 10041 : 525 - 536
  • [3] An Online Inference Algorithm for Labeled Latent Dirichlet Allocation
    Zhou, Qiang
    Huang, Heyan
    Mao, Xian-Ling
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 17 - 28
  • [4] A Spectral Algorithm for Latent Dirichlet Allocation
    Anandkumar, Anima
    Foster, Dean P.
    Hsu, Daniel
    Kakade, Sham M.
    Liu, Yi-Kai
    ALGORITHMICA, 2015, 72 (01) : 193 - 214
  • [5] A Spectral Algorithm for Latent Dirichlet Allocation
    Anima Anandkumar
    Dean P. Foster
    Daniel Hsu
    Sham M. Kakade
    Yi-Kai Liu
    Algorithmica, 2015, 72 : 193 - 214
  • [6] Supervised labeled latent Dirichlet allocation for document categorization
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    Lu, You
    Liu, Yanhui
    APPLIED INTELLIGENCE, 2015, 42 (03) : 581 - 593
  • [7] Supervised labeled latent Dirichlet allocation for document categorization
    Ximing Li
    Jihong Ouyang
    Xiaotang Zhou
    You Lu
    Yanhui Liu
    Applied Intelligence, 2015, 42 : 581 - 593
  • [8] Sparsely labeled coral images segmentation with Latent Dirichlet Allocation
    Yu, Xi
    Bing, Ouyang
    Principe, Jose C.
    Farrington, Stephanie
    Reed, John
    GLOBAL OCEANS 2020: SINGAPORE - U.S. GULF COAST, 2020,
  • [9] Robust Initialization for Learning Latent Dirichlet Allocation
    Lovato, Pietro
    Bicego, Manuele
    Murino, Vittorio
    Perina, Alessandro
    SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 117 - 132
  • [10] A PERCEPTUAL HASHING ALGORITHM USING LATENT DIRICHLET ALLOCATION
    Vretos, Nicholas
    Nikolaidis, Nikos
    Pitas, Ioannis
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 362 - 365