Variable Selection Methods in Spectral Data Analysis

被引:2
|
作者
Li Yan-kun [1 ]
Dong Ru-nan [1 ]
Zhang Jin [2 ]
Huang Ke-nan [3 ]
Mao Zhi-yi [4 ]
机构
[1] North China Elect Power Univ, Dept Environm Sci & Engn, Hebei Key Lab Power Plant Flue Gas Multipollutant, Baoding 071003, Peoples R China
[2] Guizhou Med Univ, Sch Food Sci, Guiyang 550025, Peoples R China
[3] 82nd Army Grp Hosp Chinese Peoples Liberat Army, Baoding 071000, Peoples R China
[4] Tianjin Bldg Mat Sci Res Acad, Tianjin 300110, Peoples R China
关键词
Variable selection; Spectral data; Characteristic variable; Redundant information; PARTIAL LEAST-SQUARES; SUCCESSIVE PROJECTIONS ALGORITHM; NEAR-INFRARED SPECTROSCOPY; WAVELENGTH INTERVAL SELECTION; REGRESSION; ELIMINATION; MODELS; ACID; PLS; IDENTIFICATION;
D O I
10.3964/j.issn.1000-0593(2021)11-3331-08
中图分类号
O433 [光谱学];
学科分类号
0703 ; 070302 ;
摘要
How to extract useful information from massive or high-dimensional data is a huge challenge for current data analysis and a hot spot of current research. Variable selection technology can extract feature information variables from numerous and complex measurement data, and achieve the purpose of simplifying multivariate model and even improving the model's prediction performance. In spectral analysis, the measurement data will inevitably contain interference and irrelevant information variables and the multicollin earity among variables, which will affect the robustness and prediction ability of the model. Therefore, the variable(wavelength) selection methods have progressed greatly in the research and application of spectral analysis. Based on the related pieces of literature and the author' s research experiences, this paper summarizes the proposals, characteristics, developments, categories, comparisons and applications in recent five years of methods for selecting variables not only in near-infrared spectra area but also in fields of mid-infrared spectra, Raman spectra and other spectra. The parameters as their criteria or thresholds for evaluating the importance of variables and the strategies or tracks of selecting variables are vital. Moreover, each method has its advantages and limitations. In practice, it is necessary to select the appropriate method according to the characteristics of both the method and the object. Key contents : (1) Compared the wavelength selection, and wavelength interval selection methods; (2) Summarized the different variable selection methods based on PLS model parameters; (3) Classified and over viewed the variable selection methods according to the strategies of searching and selection of variables. Finally, we discuss the problems of variable selection methods (such as over fitting and instability etc. ) appearing in the actual system and the corresponding solutions. Meantime, there look forward to the research trend, development prospect and application direction of the variable selection methods. Among them, new criteria for evaluating the importance and new selection strategy of variables still require further research. It is expected that this paper will play a positive role in promoting the follow-up researches and applications of variable selection technology.
引用
收藏
页码:3331 / 3338
页数:8
相关论文
共 76 条
  • [1] The successive projections algorithm for variable selection in spectroscopic multicomponent analysis
    Araújo, MCU
    Saldanha, TCB
    Galvao, RKH
    Yoneyama, T
    Chame, HC
    Visani, V
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 57 (02) : 65 - 73
  • [2] Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra
    Cai, Wensheng
    Li, Yankun
    Shao, Xueguang
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 90 (02) : 188 - 194
  • [5] Elimination of uninformative variables for multivariate calibration
    Centner, V
    Massart, DL
    deNoord, OE
    deJong, S
    Vandeginste, BM
    Sterna, C
    [J]. ANALYTICAL CHEMISTRY, 1996, 68 (21) : 3851 - 3858
  • [6] Variable selection by modified IPW (iterative predictor weighting)-PLS (partial least squares) in continuous wavelet regression models
    Chen, D
    Hu, XG
    Shao, XG
    Su, QD
    [J]. ANALYST, 2004, 129 (07) : 664 - 669
  • [7] Chu XL, 2004, PROG CHEM, V16, P528
  • [8] Colorni A., 1991, Distributed optimization by ant colonies, V142, P134
  • [9] Automated wavelength selection for spectroscopic fuel models by symmetrically contracting repeated unmoving window partial least squares
    Cramer, Jeffrey A.
    Kramer, Kirsten E.
    Johnson, Kevin J.
    Morris, Robert E.
    Rose-Pehrsson, Susan L.
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 92 (01) : 13 - 21
  • [10] A bootstrapping soft shrinkage approach for variable selection in chemical modeling
    Deng, Bai-Chuan
    Yun, Yong-Huan
    Cao, Dong-Sheng
    Yin, Yu-Long
    Wang, Wei-Ting
    Lu, Hong-Mei
    Luo, Qian-Yi
    Liang, Yi-Zeng
    [J]. ANALYTICA CHIMICA ACTA, 2016, 908 : 63 - 74