Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA)

被引:2
|
作者
秦永彬 [1 ]
李解 [1 ]
黄瑞章 [1 ]
李晶 [1 ]
机构
[1] College of Computer Science and Technology,Guizhou University
基金
新加坡国家研究基金会;
关键词
latent Dirichlet allocation(LDA); semi-supervised learning; document clustering;
D O I
10.19884/j.1672-5220.2016.05.001
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective.
引用
收藏
页码:685 / 688
页数:4
相关论文
共 50 条
  • [31] Density-based semi-supervised clustering
    Carlos Ruiz
    Myra Spiliopoulou
    Ernestina Menasalvas
    [J]. Data Mining and Knowledge Discovery, 2010, 21 : 345 - 370
  • [32] Graph Based Semi-Supervised Non-negative Matrix Factorization for Document Clustering
    Guan, Naiyang
    Huang, Xuhui
    Lan, Long
    Luo, Zhigang
    Zhang, Xiang
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 404 - 408
  • [33] Semi-supervised document retrieval
    Li, Ming
    Li, Hang
    Zhou, Zhi-Hua
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (03) : 341 - 355
  • [34] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [35] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    [J]. INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [36] Semi-supervised collective matrix factorization for topic detection and document clustering
    Wang, Ye
    Zhang, Yanchun
    Zhou, Bin
    Jia, Yan
    [J]. 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 88 - 97
  • [37] Semi-supervised clustering methods
    Bair, Eric
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05): : 349 - 361
  • [38] Semi-supervised fuzzy co-clustering algorithm for document categorization
    Yan, Yang
    Chen, Lihui
    Tjhi, William-Chandra
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (01) : 55 - 74
  • [39] An active learning framework for semi-supervised document clustering with language modeling
    Huang, Ruizhang
    Lam, Wai
    [J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (01) : 49 - 67
  • [40] Semi-supervised fuzzy co-clustering algorithm for document categorization
    Yang Yan
    Lihui Chen
    William-Chandra Tjhi
    [J]. Knowledge and Information Systems, 2013, 34 : 55 - 74