Document clustering using locality preserving indexing

被引:542
|
作者
Cai, D
He, XF
Han, JW
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
关键词
document clustering; locality preserving indexing; dimensionality reduction; semantics;
D O I
10.1109/TKDE.2005.198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets.
引用
收藏
页码:1624 / 1637
页数:14
相关论文
共 50 条
  • [21] Efficient binary code indexing with pivot based locality sensitive clustering
    Wei Zhang
    Ke Gao
    Yongdong Zhang
    Jintao Li
    Multimedia Tools and Applications, 2014, 69 : 491 - 512
  • [22] Efficient binary code indexing with pivot based locality sensitive clustering
    Zhang, Wei
    Gao, Ke
    Zhang, Yongdong
    Li, Jintao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 69 (02) : 491 - 512
  • [23] 2D-LPI: Two-Dimensional Locality Preserving Indexing
    Manjunath, S.
    Guru, D. S.
    Suraj, M. G.
    Dinesh, R.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 19 - +
  • [24] A Clustering Algorithm via Kernel Function and Locality Preserving Projections
    Zhan, Mengmeng
    Lu, Guangquan
    Wen, Guoqiu
    Zhang, Leyuan
    Wu, Lin
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2620 - 2625
  • [25] Hybrid distance based document clustering with keyword and phrase indexing
    Subhadra, K.
    Shashi, M.
    International Journal of Computer Science Issues, 2012, 9 (02): : 345 - 350
  • [26] Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering
    Thanh Nguyen
    Khosravi, Abbas
    Creighton, Douglas
    Nahavandi, Saeid
    JOURNAL OF NEUROSCIENCE METHODS, 2014, 238 : 43 - 53
  • [27] Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach
    Roul, Rajendra Kumar
    Asthana, Shubham Rohan
    Sahay, Sanjay Kumar
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [28] A Latent Semantic Indexing-based approach to multilingual document clustering
    Wei, Chih-Ping
    Yang, Christopher C.
    Lin, Chia-Min
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 606 - 620
  • [29] A LOCALITY-PRESERVING ESSENCE VECTOR MODELING FRAMEWORK FOR SPOKEN DOCUMENT RETRIEVAL
    Chen, Kuan-Yu
    Liu, Shih-Hung
    Chen, Berlin
    Wang, Hsin-Min
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5665 - 5669
  • [30] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
    Adinugroho, Sigit
    Sari, Yuita Arum
    Fauzi, M. Ali
    Adikara, Putra Pandu
    2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85