Variable Selection Methods in Spectral Data Analysis

被引:2
|
作者
Li Yan-kun [1 ]
Dong Ru-nan [1 ]
Zhang Jin [2 ]
Huang Ke-nan [3 ]
Mao Zhi-yi [4 ]
机构
[1] North China Elect Power Univ, Dept Environm Sci & Engn, Hebei Key Lab Power Plant Flue Gas Multipollutant, Baoding 071003, Peoples R China
[2] Guizhou Med Univ, Sch Food Sci, Guiyang 550025, Peoples R China
[3] 82nd Army Grp Hosp Chinese Peoples Liberat Army, Baoding 071000, Peoples R China
[4] Tianjin Bldg Mat Sci Res Acad, Tianjin 300110, Peoples R China
关键词
Variable selection; Spectral data; Characteristic variable; Redundant information; PARTIAL LEAST-SQUARES; SUCCESSIVE PROJECTIONS ALGORITHM; NEAR-INFRARED SPECTROSCOPY; WAVELENGTH INTERVAL SELECTION; REGRESSION; ELIMINATION; MODELS; ACID; PLS; IDENTIFICATION;
D O I
10.3964/j.issn.1000-0593(2021)11-3331-08
中图分类号
O433 [光谱学];
学科分类号
0703 ; 070302 ;
摘要
How to extract useful information from massive or high-dimensional data is a huge challenge for current data analysis and a hot spot of current research. Variable selection technology can extract feature information variables from numerous and complex measurement data, and achieve the purpose of simplifying multivariate model and even improving the model's prediction performance. In spectral analysis, the measurement data will inevitably contain interference and irrelevant information variables and the multicollin earity among variables, which will affect the robustness and prediction ability of the model. Therefore, the variable(wavelength) selection methods have progressed greatly in the research and application of spectral analysis. Based on the related pieces of literature and the author' s research experiences, this paper summarizes the proposals, characteristics, developments, categories, comparisons and applications in recent five years of methods for selecting variables not only in near-infrared spectra area but also in fields of mid-infrared spectra, Raman spectra and other spectra. The parameters as their criteria or thresholds for evaluating the importance of variables and the strategies or tracks of selecting variables are vital. Moreover, each method has its advantages and limitations. In practice, it is necessary to select the appropriate method according to the characteristics of both the method and the object. Key contents : (1) Compared the wavelength selection, and wavelength interval selection methods; (2) Summarized the different variable selection methods based on PLS model parameters; (3) Classified and over viewed the variable selection methods according to the strategies of searching and selection of variables. Finally, we discuss the problems of variable selection methods (such as over fitting and instability etc. ) appearing in the actual system and the corresponding solutions. Meantime, there look forward to the research trend, development prospect and application direction of the variable selection methods. Among them, new criteria for evaluating the importance and new selection strategy of variables still require further research. It is expected that this paper will play a positive role in promoting the follow-up researches and applications of variable selection technology.
引用
收藏
页码:3331 / 3338
页数:8
相关论文
共 76 条
  • [11] A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling
    Deng, Bai-chuan
    Yun, Yong-huan
    Liang, Yi-zeng
    Yi, Lun-zhao
    [J]. ANALYST, 2014, 139 (19) : 4836 - 4845
  • [12] Ding Y, 2019, ANAL METHODS-UK, V11, P3657, DOI [10.1039/C9AY01030K, 10.1039/c9ay01030k]
  • [13] Evaluation of dietary fiber of Brazilian soybean (Glycine max) using near-infrared spectroscopy and chemometrics
    Ferreira, Daniela Souza
    Poppi, Ronei Jesus
    Lima Pallone, Juliana Azevedo
    [J]. JOURNAL OF CEREAL SCIENCE, 2015, 64 : 43 - 47
  • [14] Fisher R. A., 1937, The design of experiments.
  • [15] The use of multiple measurements in taxonomic problems
    Fisher, RA
    [J]. ANNALS OF EUGENICS, 1936, 7 : 179 - 188
  • [16] Forina M, 1999, J CHEMOMETR, V13, P165
  • [17] The successive projections algorithm for interval selection in PLS
    Gomes, Adriano de Araujo
    Harrop Galvao, Roberto Kawakami
    Ugulino de Araujo, Mario Cesar
    Veras, Germano
    da Silva, Edvan Cirino
    [J]. MICROCHEMICAL JOURNAL, 2013, 110 : 202 - 208
  • [18] A Bootstrap-VIP approach for selecting wavelength intervals in spectral imaging applications
    Gosselin, Ryan
    Rodrigue, Denis
    Duchesne, Carl
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 100 (01) : 12 - 21
  • [19] RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS
    HOERL, AE
    KENNARD, RW
    [J]. TECHNOMETRICS, 1970, 12 (01) : 55 - &
  • [20] Holland J. H., 1975, Adaptation in Natural and Artificial Systems