Supervised latent semantic indexing for document categorization

被引:20
|
作者
Sun, JT [1 ]
Chen, Z [1 ]
Zeng, HJ [1 ]
Lu, YC [1 ]
Shi, CY [1 ]
Ma, WY [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
关键词
D O I
10.1109/ICDM.2004.10004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is a successful technology in information retrieval (IS) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper we propose Supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results.
引用
收藏
页码:535 / 538
页数:4
相关论文
共 50 条
  • [41] LATENT SEMANTIC INDEXING FOR PATENT DOCUMENTS
    Moldovan, Andreea
    Bot, Radu Ioan
    Wanka, Gert
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2005, 15 (04) : 551 - 560
  • [42] Technology classification with latent semantic indexing
    Thorleuchter, Dirk
    Van den Poel, Dirk
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) : 1786 - 1795
  • [43] The Limitation of the SVD for Latent Semantic Indexing
    Mirzal, Andri
    2013 IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2013), 2013, : 413 - 416
  • [44] On updating problems in latent semantic indexing
    Zha, Hongyuan
    Simon, Horst D.
    SIAM Journal on Scientific Computing, 21 (02): : 782 - 791
  • [45] Latent semantic analysis approaches to categorization
    Laham, D
    PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1997, : 979 - 979
  • [46] An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis
    Ju, Ronghui
    Zhou, Pan
    Li, Cheng Hua
    Liu, Lijun
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 2280 - 2287
  • [47] Learning Spoken Document Similarity and Recommendation using Supervised Probabilistic Latent Semantic Analysis
    Thambiratnam, K.
    Seide, F.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2840 - 2843
  • [48] Latent Dirichlet Allocation for Automatic Document Categorization
    Biro, Istvan
    Szabo, Jacint
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 430 - 441
  • [49] Latent semantic indexing for semantic content detection of video shots
    Souvannavong, F
    Merialdo, B
    Huet, B
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1783 - 1786
  • [50] Configuring Latent Semantic Indexing for Requirements Tracing
    Eder, Sebastian
    Femmer, Henning
    Hauptmann, Benedikt
    Junker, Maximilian
    2015 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON REQUIREMENTS ENGINEERING AND TESTING (RET), 2015, : 27 - 33