An Variable Selection Method of the Significance Multivariate Correlation Competitive Population Analysis for Near-Infrared Spectroscopy in Chemical Modeling

被引:8
|
作者
Wang, Yuxi [1 ]
Jia, Zhenhong [1 ]
Yang, Jie [2 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China
基金
美国国家科学基金会;
关键词
Spectrochemical analysis; variable selection; the significant multivariate correlation; weighted bootstrap sampling; model population analysis; monte Carlo sampling; analytical techniques; partial least squares method; PARTIAL LEAST-SQUARES; REGRESSION; SHRINKAGE; CALIBRATION; PROJECTION; STRATEGY; SPACE; OPTIMIZATION; PERSPECTIVE; WAVELENGTHS;
D O I
10.1109/ACCESS.2019.2954115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The high dimensionality of spectral datasets makes it difficult to select the optimal subset of variables. This paper presents a new method for variable selection called the significant multivariate competitive population analysis (SMCPA), Which combines ideas of significant multivariate correlation (SMC) and model population analysis, and employs weighted bootstrap sampling (WBS) and exponential decline function (EDF) competition methods. In this study, the values of SMC distributions are used as an index for evaluating the importance of each wavelength. Then, based on the importance level of each wavelength. SMCPA sequentially selects N subsets of spectral wavelengths by N Monte Carlo sampling in an iterative and competitive procedure. In each sampling run, a fixed ratio of samples is used to build a calibrated partial least-squares model, and then SMC is performed to obtain the score and threshold values. Next, based on the significant multivariate correlation scores, the key variables are selected by two steps: the compulsory selection of exponential decline function and the competitive selection of adaptive weighted sampling. Finally, cross-validation(CV) is applied to select the optimal subset with the lowest root mean square error. This method is tested on three NIR spectral datasets and compared against three high-performance variable selection methods. The experimental results show that the proposed algorithm has the highest efficiency and the best selection effect, and can usually locate the optimal combination of key wavelength variables in a dataset. The evaluation result after PLS modeling is also the best.
引用
下载
收藏
页码:167195 / 167209
页数:15
相关论文
共 50 条
  • [11] Nonlinear multivariate modeling of strand mechanical properties with near-infrared spectroscopy
    Via, Brian K.
    Jiang, W.
    FORESTRY CHRONICLE, 2013, 89 (05): : 621 - 630
  • [12] A variable importance criterion for variable selection in near-infrared spectral analysis
    Jin Zhang
    Xiaoyu Cui
    Wensheng Cai
    Xueguang Shao
    Science China Chemistry, 2019, 62 (02) : 271 - 279
  • [13] A variable importance criterion for variable selection in near-infrared spectral analysis
    Jin Zhang
    Xiaoyu Cui
    Wensheng Cai
    Xueguang Shao
    Science China Chemistry, 2019, 62 : 271 - 279
  • [14] A variable importance criterion for variable selection in near-infrared spectral analysis
    Zhang, Jin
    Cui, Xiaoyu
    Cai, Wensheng
    Shao, Xueguang
    SCIENCE CHINA-CHEMISTRY, 2019, 62 (02) : 271 - 279
  • [15] Variable selection for quantitative determination of glucose concentration with near-infrared spectroscopy
    McShane, MJ
    Cote, GL
    Spiegelman, C
    OPTICAL DIAGNOSTICS OF BIOLOGICAL FLUIDS AND ADVANCED TECHNIQUES IN ANALYTICAL CYTOLOGY, PROCEEDINGS OF, 1997, 2982 : 189 - 197
  • [16] A near infrared wavelength selection method based on the variable stability and population analysis
    Zhang Feng
    Tang Xiao-Jun
    Tong Ang-Xin
    Wang Bin
    Wang Jing-Wei
    JOURNAL OF INFRARED AND MILLIMETER WAVES, 2020, 39 (03) : 318 - 323
  • [17] A new near-infrared spectroscopy informative interval selection method
    Xu, Long
    Lu, Jiangang
    Yang, Qinmin
    Chen, Jinshui
    Shi, Yingzi
    Huagong Xuebao/CIESC Journal, 2013, 64 (12): : 4410 - 4415
  • [18] Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data
    Balabin, Roman M.
    Smirnov, Sergey V.
    ANALYTICA CHIMICA ACTA, 2011, 692 (1-2) : 63 - 72
  • [19] Interpretable Perturbator for Variable Selection in near-Infrared Spectral Analysis
    Duan, Chaoshu
    Liu, Xuyang
    Cai, Wensheng
    Shao, Xueguang
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 64 (07) : 2508 - 2514
  • [20] Multivariate calibration modeling of liver oxygen saturation using near-infrared spectroscopy
    Cingo, NA
    Soller, BR
    Puyana, JC
    BIOMEDICAL DIAGNOSTIC, GUIDANCE, AND SURGICAL-ASSIST SYSTEMS II, 2000, 3911 : 230 - 236