Robust discriminant analysis of latent semantic feature for text categorization

被引:0
|
作者
Hu, Jiani [1 ]
Deng, Weihong [1 ]
Guo, Jun [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a Discriminative Semantic Feature (DSF) method for vector space model based text categorization. The DSF method, which involves two stages, first reduces the dimension of the document vector space by Latent Semantic Indexing (LSI), and then applies a Robust linear Discriminant analysis Model (RDM), which improves the classical LDA by a energy-adaptive regularization criteria, to extract the discriminative semantic feature with enhanced discrimination power. As a result, DSF method can not only uncover latent semantic structure but also capture the discriminative feature. Comparative experiments on various state-of-art dimension reduction schemes such as our DSF, LSI, orthogonal centroid, two-stage LSI+LDA, LDA/QR and LDA/GSVD, are also performed. Experiments using the Reuters-21578 text collection show the proposed method performs better than other algorithms.
引用
收藏
页码:400 / 409
页数:10
相关论文
共 50 条
  • [41] KANNADA TEXT SUMMARIZATION USING LATENT SEMANTIC ANALYSIS
    Geetha, J. K.
    Deepamala, N.
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1508 - 1512
  • [42] Latent semantic analysis for text-based research
    Foltz, PW
    BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1996, 28 (02): : 197 - 202
  • [43] Ensemble multi-label text categorization based on rotation forest and latent semantic indexing
    Elghazel, Haytham
    Aussem, Alex
    Gharroudi, Ouadie
    Saadaoui, Wafa
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 57 : 1 - 11
  • [44] Feature Extraction based on Principal Component Analysis for Text Categorization
    Lhazmir, Safae
    El Moudden, Ismail
    Kobbane, Abdellatif
    2017 INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION AND MODELING IN WIRED AND WIRELESS NETWORKS (PEMWN), 2017,
  • [45] Categorization and Monitoring of Internet Public Opinion Based on Latent Semantic Analysis
    Wan, Yuan
    Tong, Hengqing
    ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 2, 2009, : 121 - 124
  • [46] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [47] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [48] Supervised latent semantic indexing for document categorization
    Sun, JT
    Chen, Z
    Zeng, HJ
    Lu, YC
    Shi, CY
    Ma, WY
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 535 - 538
  • [49] lsemantica: A command for text similarity based on latent semantic analysis
    Schwarz, Carlo
    STATA JOURNAL, 2019, 19 (01): : 129 - 142
  • [50] Web Text Classification Based on Improved Latent Semantic Analysis
    Wang, Lan
    Wan, Yuan
    2011 SECOND ETP/IITA CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2011), VOL 1, 2011, : 176 - 179