A method of dimensionality reduction by selection of components in principal component analysis for text classification

被引:5
|
作者
Zhang, Yangwu [1 ,2 ]
Li, Guohe [1 ,3 ]
Zong, Heng [2 ]
机构
[1] China Univ Petr, Coll Geophys & Informat Engn, Beijing, Peoples R China
[2] China Univ Polit Sci & Law, Dept Sci & Technol Teaching, Beijing, Peoples R China
[3] China Univ Petr, Beijing Key Lab Data Min Petr Data, Beijing, Peoples R China
关键词
Principal components analysis; Dimensionality reduction; Text classification;
D O I
10.2298/FIL1805499Z
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.
引用
收藏
页码:1499 / 1506
页数:8
相关论文
共 50 条
  • [21] Global principal component analysis for dimensionality reduction in distributed data mining
    Qi, HR
    Wang, TW
    Birdwell, JD
    STATISTICAL DATA MINING AND KNOWLEDGE DISCOVERY, 2004, : 323 - 338
  • [22] EPCA—Enhanced Principal Component Analysis for Medical Data Dimensionality Reduction
    Vinutha M.R.
    Chandrika J.
    Krishnan B.
    Kokatnoor S.A.
    SN Computer Science, 4 (3)
  • [23] The Connections between Principal Component Analysis and Dimensionality Reduction Methods of Manifolds
    Li, Bo
    Liu, Jin
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 638 - +
  • [24] Dimensionality Reduction of Speech Features using Nonlinear Principal Components Analysis
    Zahorian, Stephen A.
    Singh, Tara
    Hu, Hongbing
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 281 - +
  • [25] A Novel Approach for Band Selection Using Virtual Dimensionality Estimate and Principal Component Analysis for Satellite Image Classification
    Sehgal, Smriti
    Ahuja, Laxmi
    Bindu, M. Hima
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2022, 18 (02)
  • [26] Principal components selection for dimensionality reduction using discriminant information applied to fault diagnosis
    Prieto-Moreno, A.
    Llanes-Santiago, O.
    Garcia-Moreno, E.
    JOURNAL OF PROCESS CONTROL, 2015, 33 : 14 - 24
  • [27] Probabilistic principal component analysis-based dimensionality reduction and optimization for arrhythmia classification using ECG signals
    Vishwanath, Bhagyalakshmi
    Pujeri, Ramchandra Vittal
    Devanagavi, Geeta
    BIO-ALGORITHMS AND MED-SYSTEMS, 2019, 15 (01)
  • [28] Taxonomic Dimensionality Reduction in Bayesian Text Classification
    McAllister, Richard
    Sheppard, John
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 508 - 513
  • [29] Dimensionality Reduction by Mutual Information for Text Classification
    刘丽珍
    宋瀚涛
    陆玉昌
    Journal of Beijing Institute of Technology(English Edition), 2005, (01) : 32 - 36
  • [30] FPGA implementation of the principal component analysis algorithm for dimensionality reduction of hyperspectral images
    Fernandez, Daniel
    Gonzalez, Carlos
    Mozos, Daniel
    Lopez, Sebastian
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (05) : 1395 - 1406