Decomposition of Term-Document Matrix Representation for Clustering Analysis

被引:0
|
作者
Yang, Jianxiong [1 ]
Watada, Junzo [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Kitakyushu, Fukuoka, Japan
关键词
Fuzzy clustering; data mining; LSI; SVD;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is an information retrieval technique using a low-rank singular value decomposition (SVD) of term-document matrix. The aim of this method is to reduce the matrix dimension by finding a pattern in document collection with concurrently referring terms. The methods are implemented to calculate the weight of term-document in vector space model (VSM) for document clustering using fuzzy clustering algorithm. LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query-matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query-matching method requires computing the similarity measure about the query of every term and document in the vector space. In this paper, the Maximal Tree Algorithm is used within a recent LSI implementation to mitigate the computational time and computational complexity of query matching. The Maximal Tree data structure stores the term and document vectors in such a way that only those terms and documents are most likely qualified as the nearest neighbor to the query will be examined and retrieved. In a word, this novel algorithm is suitable for improving the accuracy of data miners.
引用
下载
收藏
页码:976 / 983
页数:8
相关论文
共 50 条
  • [1] Decomposition of a Term-Document Matrix Representation for Faithful Customer Analysis
    Yang, Jianxiong
    Watada, Junzo
    INTELLIGENT DECISION TECHNOLOGIES, 2013, 255 : 168 - 177
  • [2] High Performance in Minimizing of Term-Document Matrix Representation for Document Clustering
    Muflikhah, L.
    Baharudin, B.
    2009 CONFERENCE ON INNOVATIVE TECHNOLOGIES IN INTELLIGENT SYSTEMS AND INDUSTRIAL APPLICATIONS, 2009, : 225 - 229
  • [3] Web Document Clustering based on a New Niching Memetic Algorithm, Term-Document Matrix and Bayesian Information Criterion
    Cobos, Carlos
    Montealegre, Claudia
    Mejia, Maria-Fernanda
    Mendoza, Martha
    Leon, Elizabeth
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [4] Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix
    Fujita, Etsuro
    Oyama, Keizo
    INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 293 - 302
  • [5] Compression Experiments on Term-Document Index
    Sorkun, Murat Cihan
    Ozbey, Can
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 435 - 439
  • [6] Information Retrieval Using the Reduced Row Echelon Form of a Term-Document Matrix
    Parali, Ufuk
    Zontul, Metin
    Ertugrul, Duygu Celik
    JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1037 - 1046
  • [7] Sentence level matrix representation for document spectral clustering
    Mijangos, Victor
    Sierra, Gerardo
    Montes, Azucena
    PATTERN RECOGNITION LETTERS, 2017, 85 : 29 - 34
  • [8] On the automatic classification of accounting concepts: Preliminary results of tho statistical analysis of term-document frequencies
    Gangolly, J.
    Wu, Y.-F.
    New Review of Applied Expert Systems and Emerging Technologies, 2000, 6 : 81 - 88
  • [9] Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix - Pursuit of Enhanced Informational Search on the Web
    Fujita, Etsuro
    Oyama, Keizo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (05): : 1016 - 1028
  • [10] CLSI: A Flexible Approximation Scheme from Clustered Term-Document Matrices
    Zeimpekis, Dimitrios
    Gallopoulos, Efstratios
    PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 631 - 635