Decomposition of Term-Document Matrix Representation for Clustering Analysis

被引:0
|
作者
Yang, Jianxiong [1 ]
Watada, Junzo [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Kitakyushu, Fukuoka, Japan
关键词
Fuzzy clustering; data mining; LSI; SVD;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is an information retrieval technique using a low-rank singular value decomposition (SVD) of term-document matrix. The aim of this method is to reduce the matrix dimension by finding a pattern in document collection with concurrently referring terms. The methods are implemented to calculate the weight of term-document in vector space model (VSM) for document clustering using fuzzy clustering algorithm. LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query-matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query-matching method requires computing the similarity measure about the query of every term and document in the vector space. In this paper, the Maximal Tree Algorithm is used within a recent LSI implementation to mitigate the computational time and computational complexity of query matching. The Maximal Tree data structure stores the term and document vectors in such a way that only those terms and documents are most likely qualified as the nearest neighbor to the query will be examined and retrieved. In a word, this novel algorithm is suitable for improving the accuracy of data miners.
引用
下载
收藏
页码:976 / 983
页数:8
相关论文
共 50 条
  • [21] Sparsest factor analysis for clustering variables: a matrix decomposition approach
    Adachi, Kohei
    Trendafilov, Nickolay T.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (03) : 559 - 585
  • [22] Sparsest factor analysis for clustering variables: a matrix decomposition approach
    Kohei Adachi
    Nickolay T. Trendafilov
    Advances in Data Analysis and Classification, 2018, 12 : 559 - 585
  • [23] On the Chinese document clustering based on dynamical term clustering
    Tseng, CM
    Tsai, KH
    Hsu, CC
    Chang, HC
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 534 - 539
  • [24] Document representation and its application to page decomposition
    Jain, AK
    Yu, B
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) : 294 - 308
  • [25] Document clustering using nonnegative matrix factorization/
    Shahnaz, F
    Berry, MW
    Pauca, VP
    Plemmons, RJ
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (02) : 373 - 386
  • [26] Genetic Algorithm and Confusion Matrix for Document Clustering
    Santra, A.K.
    Christy, C. Josephine
    International Journal of Computer Science Issues, 2012, 9 (1 1-2): : 322 - 328
  • [27] Nonnegative Matrix Factorization for Document Clustering: A Survey
    Hosseini-Asl, Ehsan
    Zurada, Jacek M.
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2014, PT II, 2014, 8468 : 726 - 737
  • [28] Learning the kernel matrix for XML document clustering
    Yang, JW
    Cheung, WK
    Chen, X
    2005 IEEE International Conference on e-Technology, e-Commerce and e-Service, Proceedings, 2005, : 353 - 358
  • [29] Relationship Matrix Nonnegative Decomposition for Clustering
    Pan, Ji-Yuan
    Zhang, Jiang-She
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2011, 2011
  • [30] Using a Matrix Decomposition for Clustering Data
    Abdulla, Hussain Dahwa
    Polovincak, Martin
    Snasel, Vaclav
    2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS, PROCEEDINGS, 2009, : 18 - 23