Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA)

被引:2
|
作者
秦永彬 [1 ]
李解 [1 ]
黄瑞章 [1 ]
李晶 [1 ]
机构
[1] College of Computer Science and Technology,Guizhou University
基金
新加坡国家研究基金会;
关键词
latent Dirichlet allocation(LDA); semi-supervised learning; document clustering;
D O I
10.19884/j.1672-5220.2016.05.001
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective.
引用
收藏
页码:685 / 688
页数:4
相关论文
共 50 条
  • [1] Semi-Supervised Latent Dirichlet Allocation and its Application for Document Classification
    Wang, Di
    Thint, Marcus
    Al-Rubaie, Ahmad
    [J]. 2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS (WI-IAT WORKSHOPS 2012), VOL 3, 2012, : 306 - 310
  • [2] Semi supervised classification of scientific and technical literature based on semi supervised hierarchical description of improved latent dirichlet allocation (LDA)
    Yongjun Zhang
    Jialin Ma
    Zijian Wang
    [J]. Cluster Computing, 2019, 22 : 6881 - 6889
  • [3] Semi supervised classification of scientific and technical literature based on semi supervised hierarchical description of improved latent dirichlet allocation (LDA)
    Zhang, Yongjun
    Ma, Jialin
    Wang, Zijian
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6881 - S6889
  • [4] Regularized Semi-Supervised Latent Dirichlet Allocation for visual concept learning
    Zhuang, Liansheng
    Gao, Haoyuan
    Luo, Jiebo
    Lin, Zhouchen
    [J]. NEUROCOMPUTING, 2013, 119 : 26 - 32
  • [5] Regularized Semi-supervised Latent Dirichlet Allocation for Visual Concept Learning
    Zhuang, Liansheng
    She, Lanbo
    Huang, Jingjing
    Luo, Jiebo
    Yu, Nenghai
    [J]. ADVANCES IN MULTIMEDIA MODELING, PT I, 2011, 6523 : 403 - +
  • [6] Randomized feature selection based semi-supervised latent Dirichlet allocation for microbiome analysis
    Pais, Namitha
    Ravishanker, Nalini
    Rajasekaran, Sanguthevar
    Weinstock, George
    Tran, Dong-Binh
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01)
  • [7] A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation
    Xu, Jungang
    Zhou, Shilong
    Qiu, Lin
    Liu, Shengyuan
    Li, Pengfei
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2014, 2014, 8793 : 49 - 60
  • [8] A semi-supervised document clustering algorithm based on EM
    Rigutini, L
    Maggini, M
    [J]. 2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 200 - 206
  • [9] Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation
    Fu, Ying
    Yan, Meng
    Zhang, Xiaohong
    Xu, Ling
    Yang, Dan
    Kymer, Jeffrey D.
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 57 : 369 - 377
  • [10] Supervised labeled latent Dirichlet allocation for document categorization
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    Lu, You
    Liu, Yanhui
    [J]. APPLIED INTELLIGENCE, 2015, 42 (03) : 581 - 593