Local and Global Latent Semantic Analysis for Text Categorization

被引:0
|
作者
Ghanem, Khadoudja [1 ]
机构
[1] Univ Constantine 2, Dept Comp Sci, MISC Lab, Constantine, Algeria
关键词
Class Representative Term Vector; Clustering; Latent Semantic Analysis; Text Classification;
D O I
10.4018/IJIRR.2014070101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [1] An Application of Latent Semantic Analysis for Text Categorization
    Kou, G.
    Peng, Y.
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2015, 10 (03) : 357 - 369
  • [2] Web text categorization based on latent semantic analysis
    Wang Jianfeng
    Yuan Jinsha
    [J]. ICCSE'2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2006, : 826 - 828
  • [3] Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization
    Wan, Yuan
    Tong, Hengqing
    Deng, Yanfang
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 168 - 171
  • [4] Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization
    Wan, Yuan
    Tong, Hengqing
    Deng, Yanfang
    [J]. APPLIED INFORMATICS AND COMMUNICATION, PT III, 2011, 226 : 321 - 329
  • [5] Robust discriminant analysis of latent semantic feature for text categorization
    Hu, Jiani
    Deng, Weihong
    Guo, Jun
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 400 - 409
  • [6] Latent semantic analysis for text categorization using neural network
    Yu, Bo
    Xu, Zong-ben
    Li, Cheng-hua
    [J]. KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 900 - 904
  • [7] Latent semantic analysis approaches to categorization
    Laham, D
    [J]. PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1997, : 979 - 979
  • [8] A Latent Semantic Analysis-based Approach to Geographic Feature Categorization from Text
    Huang, Yuxia
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 87 - 94
  • [9] A novel multilingual text categorization system using latent semantic indexing
    Lee, Chung-Hong
    Yang, Hsin-Chang
    Ma, Sheng-Min
    [J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 503 - +
  • [10] Text categorization based on combination of modified back propagation neural network and latent semantic analysis
    Wei Wang
    Bo Yu
    [J]. Neural Computing and Applications, 2009, 18 : 875 - 881