Transductive learning for short-text classification problems using latent semantic indexing

被引:23
|
作者
Zelikovitz, S [1 ]
Marquez, F [1 ]
机构
[1] CUNY Coll Staten Isl, Dept Comp Sci, Staten Isl, NY 10314 USA
关键词
text classification; LSI; transduction;
D O I
10.1142/S0218001405003971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. This method can be especially useful to combat possible inclusion of unrelated data in the original corpus, and to compensate for limited amounts of data. Additionally, we evaluate the vocabulary of the training and test sets and present the results of a series of experiments to illustrate how the test set is used in an advantageous way.
引用
收藏
页码:143 / 163
页数:21
相关论文
共 50 条
  • [1] Improving text classification using local latent semantic indexing
    Liu, T
    Chen, H
    Zhang, BY
    Ma, WY
    Wu, GY
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 162 - 169
  • [2] Language independent semantic kernels for short-text classification
    Kim, Kwanho
    Chung, Beom-suk
    Choi, Yerim
    Lee, Seungjun
    Jung, Jae-Yoon
    Park, Jonghun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 735 - 743
  • [3] THE APPLICATION OF LATENT SEMANTIC INDEXING AND ONTOLOGY IN TEXT CLASSIFICATION
    Yang, Xi-Quan
    Sun, Na
    Sun, Tie-Li
    Cao, Xue-Ya
    Zheng, Xiao-Juan
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4491 - 4499
  • [4] Classification of Machine Learning Engines using Latent Semantic Indexing
    Yusof, Yuhanis
    Alhersh, Taha
    Mahmuddin, Massudi
    Din, Aniza Mohamed
    [J]. PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2012, 2012, : 482 - 486
  • [5] A neuro-SVM model for text classification using latent semantic indexing
    Mitra, V
    Wang, CJ
    Banerjee, S
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 564 - 569
  • [6] Review of short-text classification
    Alsmadi, Issa
    Gan, Keng Hoon
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (02) : 155 - 182
  • [7] Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge
    Yang, Haiqin
    King, Irwin
    [J]. ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 53 - 60
  • [8] Text segmentation by latent semantic indexing
    Ishioka, T
    [J]. NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [9] Evaluating the utility of statistical phrases and latent semantic indexing for text classification
    Wu, HW
    Gunopulos, D
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 713 - 716
  • [10] Semantic concept space based progressive transductive learning for text classification
    Zhang, Xiaobin
    Yin, Yingshun
    Gao, Lili
    Zheng, Jing
    Niu, Yanzhan
    [J]. RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 324 - 328