Fast principal component analysis of large data sets

被引:45
|
作者
Vogt, F [1 ]
Tacke, M [1 ]
机构
[1] Res Inst Optron & Pattern Recognit, D-76275 Ettlingen, Germany
关键词
PCA; SVD; wavelet transformation;
D O I
10.1016/S0169-7439(01)00130-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Principal component analysis (PCA) and principal component regression (PCR) are widespread algorithms for calibration of spectrometers and evaluation of unknown measurement spectra. In many measurement tasks, the amount of calibration data is increasing nowadays due to new devices like hyperspectral imagers. Core of PCA is the singular value decomposition (SVD) of the matrix containing the calibration spectra. SVD of large calibration sets is computational, very expensive and often gets unreasonable due to excessive calculation times. With hyperspectral imaging as application in mind, an algorithm is proposed for compressing calibration spectra based on a wavelet transformation before performing the SVD. Considering only relevant wavelet coefficients can accelerate the SVD. After determining the relevant principal components (PCs) from this shrunken calibration matrix in the wavelet domain, they are expanded again by insertion of zeros at the right positions. Denoised PCs are then obtained by the inverse wavelet transform into the wavelength domain. An additional computation speed increase is described for "landscape" matrices by transposing the matrix before performing the SVD. In the Results section, both PCA approaches are demonstrated to result in comparable PCs. This is done by means of synthetically generated spectra as well as by experimental FTIR-data. By this algorithm, the PCA of the discussed examples could be accelerated up to a factor of 52. Additionally, concentrations of synthetic spectra are evaluated by means of the PCs obtained by the different PCA algorithms. Both PC sets, the conventional and the one based on the new technique, result in equivalent concentration values. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1 / 18
页数:18
相关论文
共 50 条
  • [1] Fast principal component analysis of large data sets based on information extraction
    Vogt, F
    Tacke, M
    [J]. JOURNAL OF CHEMOMETRICS, 2002, 16 (11) : 562 - 575
  • [2] AN ALGORITHM FOR THE PRINCIPAL COMPONENT ANALYSIS OF LARGE DATA SETS
    Halko, Nathan
    Martinsson, Per-Gunnar
    Shkolnisky, Yoel
    Tygert, Mark
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (05): : 2580 - 2594
  • [3] FUNCTIONAL CONNECTIVITY - THE PRINCIPAL-COMPONENT ANALYSIS OF LARGE (PET) DATA SETS
    FRISTON, KJ
    FRITH, CD
    LIDDLE, PF
    FRACKOWIAK, RSJ
    [J]. JOURNAL OF CEREBRAL BLOOD FLOW AND METABOLISM, 1993, 13 (01): : 5 - 14
  • [4] Fuzzy Clustering of Large-Scale Data Sets Using Principal Component Analysis
    Arfaoui, Olfa
    Sassi Hidri, Minyar
    [J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 683 - 690
  • [5] Principal component analysis for distributed data sets with updating
    Bai, ZJ
    Chan, RH
    Luk, FT
    [J]. ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2005, 3756 : 471 - 483
  • [6] Fast Principal Component Analysis of Large-Scale Genome-Wide Data
    Abraham, Gad
    Inouye, Michael
    [J]. PLOS ONE, 2014, 9 (04):
  • [7] Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets
    Fangzhou Yao
    Jeff Coquery
    Kim-Anh Lê Cao
    [J]. BMC Bioinformatics, 13
  • [8] Application of the OpenCL API for implementation of the NIPALS algorithm for principal component analysis of large data sets
    Bowden, Joshua C.
    [J]. Proceedings - 6th IEEE International Conference on e-Science Workshops, e-ScienceW 2010, 2010, : 25 - 30
  • [9] Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets
    Yao, Fangzhou
    Coquery, Jeff
    Le Cao, Kim-Anh
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [10] A Study of Effectiveness of Principal Component Analysis on Different Data Sets
    Krishnan, Mukti
    Dutta, Dipankar
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 243 - 248