Document clustering using locality preserving indexing

被引:542
|
作者
Cai, D
He, XF
Han, JW
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
关键词
document clustering; locality preserving indexing; dimensionality reduction; semantics;
D O I
10.1109/TKDE.2005.198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets.
引用
收藏
页码:1624 / 1637
页数:14
相关论文
共 50 条
  • [31] SUBJECT INDEXING AND CITATION INDEXING .1. CLUSTERING STRUCTURE IN THE CYSTIC-FIBROSIS DOCUMENT COLLECTION
    SHAW, WM
    INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (06) : 693 - 703
  • [32] Image retrieval using locality preserving projections
    Putchanuthala, Ramesh Babu
    Reddy, E. Sreenivasa
    JOURNAL OF ENGINEERING-JOE, 2020, 2020 (10): : 889 - 892
  • [33] XML Document Clustering Using Structure-Preserving Flat Representation of XML Content and Structure
    Hadzic, Fedja
    Hecker, Michael
    Tagarelli, Andrea
    ADVANCED DATA MINING AND APPLICATIONS, PT II, 2011, 7121 : 403 - +
  • [34] Locality-Preserving Clustering and Discovery of Wide-Area Grid Resources
    Shen, Haiying
    Hwang, Kai
    2009 29TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2009, : 518 - +
  • [35] Document Clustering Using Gravitational Ensemble Clustering
    Sadeghian, Armindokht Hashempour
    Nezamabadi-pour, Hossein
    2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, : 240 - 245
  • [36] Locality-Preserving L1-Graph and Its Application in Clustering
    Han, Shuchu
    Huang, Hao
    Qin, Hong
    Yu, Dantong
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 813 - 818
  • [37] A privacy-preserving recommendation method with clustering and locality-sensitive hashing
    Zhang, Hanrui
    Li, Qianmu
    Xu, Jiangmin
    Meng, Shunmei
    Hou, Jun
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (01) : 121 - 144
  • [38] Using clustering for document reconstruction
    Ukovich, Anna
    Zacchigna, Alessandra
    Ramponi, Giovanni
    Schoier, Gabriella
    IMAGE PROCESSING: ALGORITHMS AND SYSTEMS, NEURAL NETWORKS, AND MACHINE LEARNING, 2006, 6064
  • [39] Content-Based Image Indexing by Data Clustering and Inverse Document Frequency
    Grycuk, Rafal
    Gabryel, Marcin
    Korytkowski, Marcin
    Scherer, Rafal
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2014, 2014, 424 : 374 - 383
  • [40] Face recognition using illuminant locality preserving projections
    刘朋樟
    沈庭芝
    林健文
    Journal of Beijing Institute of Technology, 2011, 20 (01) : 111 - 116