Decomposition of Term-Document Matrix Representation for Clustering Analysis

被引:0
|
作者
Yang, Jianxiong [1 ]
Watada, Junzo [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Kitakyushu, Fukuoka, Japan
关键词
Fuzzy clustering; data mining; LSI; SVD;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is an information retrieval technique using a low-rank singular value decomposition (SVD) of term-document matrix. The aim of this method is to reduce the matrix dimension by finding a pattern in document collection with concurrently referring terms. The methods are implemented to calculate the weight of term-document in vector space model (VSM) for document clustering using fuzzy clustering algorithm. LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query-matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query-matching method requires computing the similarity measure about the query of every term and document in the vector space. In this paper, the Maximal Tree Algorithm is used within a recent LSI implementation to mitigate the computational time and computational complexity of query matching. The Maximal Tree data structure stores the term and document vectors in such a way that only those terms and documents are most likely qualified as the nearest neighbor to the query will be examined and retrieved. In a word, this novel algorithm is suitable for improving the accuracy of data miners.
引用
收藏
页码:976 / 983
页数:8
相关论文
共 50 条
  • [41] Document clustering based on nonnegative sparse matrix factorization
    Yang, CF
    Ye, M
    Zhao, J
    ADVANCES IN NATURAL COMPUTATION, PT 2, PROCEEDINGS, 2005, 3611 : 557 - 563
  • [42] Deep document clustering via adaptive hybrid representation learning
    Ren, Lina
    Qin, Yongbin
    Chen, Yanping
    Lin, Chuan
    Huang, Ruizhang
    KNOWLEDGE-BASED SYSTEMS, 2023, 281
  • [43] Tumor Clustering based on Penalized Matrix Decomposition
    Zheng, Chun-Hou
    Wang, Juan
    Ng, To-Yee
    Shiu, Chi Keung
    2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2010), 2010,
  • [44] Adaptive structural enhanced representation learning for deep document clustering
    Xue, Jingjing
    Huang, Ruizhang
    Bai, Ruina
    Chen, Yanping
    Qin, Yongbin
    Lin, Chuan
    APPLIED INTELLIGENCE, 2024, 54 (23) : 12315 - 12331
  • [45] Document clustering based on spectral clustering and non-negative matrix factorization
    Bao, Lei
    Tang, Sheng
    Li, Jintao
    Zhang, Yongdong
    Ye, Wei-Ping
    NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 149 - +
  • [46] Bag-of-concepts: Comprehending document representation through clustering words in distributed representation
    Kim, Han Kyul
    Kim, Hyunjoong
    Cho, Sungzoon
    NEUROCOMPUTING, 2017, 266 : 336 - 352
  • [47] Matrix decomposition representation of fast DCT algorithms
    Wu, HR
    Qiu, B
    Man, Z
    DSP 97: 1997 13TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2: SPECIAL SESSIONS, 1997, : 341 - 344
  • [48] Enhanced document clustering using fusion of multiscale wavelet decomposition
    Hussin, Mahmoud F.
    El Rube, Ibrahim
    Kamel, Mohamed S.
    2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, : 870 - +
  • [49] Document Classification with Varied Viewpoints using Matrix Decomposition
    Maruta, Kaname
    Nagai, Hidetoshi
    Nakamura, Teigo
    2015 IIAI 4TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2015, : 154 - 159
  • [50] Formal concept analysis and document clustering
    Lin, Tsau Young
    I-Jen Chiang
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 4763 - +