Transductive learning for short-text classification problems using latent semantic indexing

被引:23
|
作者
Zelikovitz, S [1 ]
Marquez, F [1 ]
机构
[1] CUNY Coll Staten Isl, Dept Comp Sci, Staten Isl, NY 10314 USA
关键词
text classification; LSI; transduction;
D O I
10.1142/S0218001405003971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. This method can be especially useful to combat possible inclusion of unrelated data in the original corpus, and to compensate for limited amounts of data. Additionally, we evaluate the vocabulary of the training and test sets and present the results of a series of experiments to illustrate how the test set is used in an advantageous way.
引用
收藏
页码:143 / 163
页数:21
相关论文
共 50 条
  • [41] Genetic algorithm for text clustering based on latent semantic indexing
    Song, Wei
    Park, Soon Cheol
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (11-12) : 1901 - 1907
  • [42] SyMSS: A syntax-based measure for short-text semantic similarity
    Oliva, Jesus
    Ignacio Serrano, Jose
    Dolores del Castillo, Maria
    Iglesias, Angel
    [J]. DATA & KNOWLEDGE ENGINEERING, 2011, 70 (04) : 390 - 405
  • [43] Short-Text Representation using Diffusion Wavelets
    Jain, Vidit
    Mahadeokar, Jay
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 301 - 302
  • [44] Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives
    Amur, Zaira Hassan
    Hooi, Yew Kwang
    Bhanbhro, Hina
    Dahri, Kamran
    Soomro, Gul Muhammad
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [45] Exploiting Global Semantic Similarity Biterms for Short-text Topic Discovery
    Lu, Heng-yang
    Ge, Gao-jian
    Li, Yun
    Wang, Chong-jun
    Xie, Jun-yuan
    [J]. 2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 975 - 982
  • [46] Term weighting scheme for short-text classification: Twitter corpuses
    Issa Alsmadi
    Gan Keng Hoon
    [J]. Neural Computing and Applications, 2019, 31 : 3819 - 3831
  • [47] Lost in Transduction: Transductive Transfer Learning in Text Classification
    Moreo, Alejandro
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (01)
  • [48] TextGCL: Graph Contrastive Learning for Transductive Text Classification
    Zhao, Yawei
    Song, Xiaoyang
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [49] A Method of Agent and Patient Relation Acquisition for Short-Text Classification
    Fan, Xinghua
    Wei, Dingbang
    [J]. ADVANCED RESEARCH ON COMPUTER SCIENCE AND INFORMATION ENGINEERING, 2011, 153 : 27 - 33
  • [50] Classification of Web Resident Sensor Resources using Latent Semantic Indexing and Ontologies
    Majavu, Wabo
    van Zyl, Terence
    Marwala, Tshilidzi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 518 - +