Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

被引：7

作者：

Tang, Yi-Kun ^{[1
,2
]}

Mao, Xian-Ling ^{[1
]}

Huang, Heyan ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Engn Res Ctr High Volume Language Informa, Beijing 100081, Peoples R China

[2] Minjiang Univ, Fujian Prov Key Lab Informat Proc & Intelligent C, Fuzhou 350121, Fujian, Peoples R China

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 2018年 / 32卷 / 04期

基金：

美国国家科学基金会;

关键词：

Topic model; Labeled Phrase LDA; Batch Labeled Phrase LDA; Online Labeled Phrase LDA; TOPIC MODELS;

D O I：

10.1007/s10618-018-0555-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There is a mass of user-marked text data on the Internet, such as web pages with categories, papers with corresponding keywords, and tweets with hashtags. In recent years, supervised topic models, such as Labeled Latent Dirichlet Allocation, have been widely used to discover the abstract topics in labeled text corpora. However, none of these topic models have taken into consideration word order under the bag-of-words assumption, which will obviously lose a lot of semantic information. In this paper, in order to synchronously model semantical label information and word order, we propose a novel topic model, called Labeled Phrase Latent Dirichlet Allocation (LPLDA), which regards each document as a mixture of phrases and partly considers the word order. In order to obtain the parameter estimation for the proposed LPLDA model, we develop a batch inference algorithm based on Gibbs sampling technique. Moreover, to accelerate the LPLDA's processing speed for large-scale stream data, we further propose an online inference algorithm for LPLDA. Extensive experiments were conducted among LPLDA and four state-of-the-art baselines. The results show (1) batch LPLDA significantly outperforms baselines in terms of case study, perplexity and scalability, and the third party task in most cases; (2) the online algorithm for LPLDA is obviously more efficient than batch method under the premise of good results.

引用

页码：885 / 912

页数：28

共 50 条

[41] Distributed Latent Dirichlet Allocation on Streams
Guo, Yunyan
Li, Jianzhong
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (01)
[42] Parallel Latent Dirichlet Allocation on GPUs
Moon, Gordon E.
Nisa, Israt
Sukumaran-Rajam, Aravind
Bandyopadhyay, Bortik
Parthasarathy, Srinivasan
Sadayappan, P.
COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 259 - 272
[43] Selecting Priors for Latent Dirichlet Allocation
Syed, Shaheen
Spruit, Marco
2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 194 - 202
[44] Crowd labeling latent Dirichlet allocation
Luca Pion-Tonachini
Scott Makeig
Ken Kreutz-Delgado
Knowledge and Information Systems, 2017, 53 : 749 - 765
[45] Latent IBP Compound Dirichlet Allocation
Archambeau, Cedric
Lakshminarayanan, Balaji
Bouchard, Guillaume
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) : 321 - 333
[46] Slow mixing for Latent Dirichlet Allocation
Jonasson, Johan
STATISTICS & PROBABILITY LETTERS, 2017, 129 : 96 - 100
[47] INFERENCE IN SUPERVISED LATENT DIRICHLET ALLOCATION
Lakshminarayanan, Balaji
Raich, Raviv
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[48] Bibliometric Analysis of Latent Dirichlet Allocation
Garg, Mohit
Rangra, Priya
DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2022, 42 (02): : 105 - 113
[49] Topic Selection in Latent Dirichlet Allocation
Wang, Biao
Liu, Zelong
Li, Maozhen
Liu, Yang
Qi, Man
2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
[50] Crowd labeling latent Dirichlet allocation
Pion-Tonachini, Luca
Makeig, Scott
Kreutz-Delgado, Ken
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (03) : 749 - 765

← 1 2 3 4 5 →