Prediction of Oil Content in Oil Shale by Near-Infrared Spectroscopy Based on Stacking Ensemble Learning

被引:5
|
作者
Li, Quan-lun [1 ]
Chen, Zheng-guang [1 ]
Jiao, Feng [2 ]
机构
[1] Heilongjiang Bayi Agr Univ, Coll Informat & Elect Engn, Daqing 163319, Peoples R China
[2] Heilongjiang Bayi Agr Univ, Coll Agr, Daqing 163319, Peoples R China
关键词
Near-infrared; Integrated learning; Oil content of oil shale; Characteristic wavelength; Random Forest feature selection; YIELD;
D O I
10.3964/j.issn.1000-0593(2023)04-1030-07
中图分类号
O433 [光谱学];
学科分类号
0703 ; 070302 ;
摘要
Aims to overcome the shortcomings that the prediction accuracy of a single model is hard to improve further, A heterogeneous ensemble learning model based on the Stacking framework, combined with near-infrared spectroscopy analysis technology, was adopted to detect the oil content in oil shale in this study. A total of 230 oil shale core samples, collected from some block in Songliao Basin, were taken as the research object, whose oil content was measured by the low-temperature dry distillation method, and near-infrared spectral data corresponding to each sample was scanned simultaneously. The Monte Carlo algorithm was employed to eliminate outlier samples, and 213 samples, after removing outliers, were randomly divided into a training set and test set according to the ratio of 3:1. The detrend coupled with the baseline correction method was used to eliminate the influence of noise and baseline drift in spectral data. After that, the random forest algorithm (RF) was used to extract the characteristic wavelength according to the importance of wavelength. In order to further reduce the data dimension, the CARS algorithm was used to extract the characteristic wavelength. Finally, PLS, SVM, RF and GBDT, whose parameters were optimized by grid search, were adopted as primary learners, and the PLS regression modelwas adopted as secondary learners to build the stacking ensemble learning model. The accuracy of the single and ensemble learning models for oil shale oil content prediction was compared under evaluation indicators of R-2 and RMSE. The research results show that the RF-CASR method can effectively screen important wavelengths and improve the efficiency of the model, thereby improving the model efficiency. Heterogeneous integrated learning models based on Stacking have better predictive performance and greater stability than single models (SVM, PLS) and homogeneous integrated learning models (RF, GBDT). Based on multiple random divisions of the data set, the average R-2 of the Stacking ensemble learning model is 0.894 2, an average increase of 0.062 3 compared with other models; the RMSEP of 0.5869 is an average of 0.147 4 lower than other models. The results of this study show that the heterogeneous integrated learning model based on stacking can combine the advantages of primary learners to predict the oil content of oil shale quickly and accurately, which provides a new fast and portable method for oil shale oil content detection.
引用
收藏
页码:1030 / 1036
页数:7
相关论文
共 14 条
  • [1] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [2] SUPPORT-VECTOR NETWORKS
    CORTES, C
    VAPNIK, V
    [J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
  • [3] Huazhou CHEN, 2015, T CHINESE SOC AGR MA, V46, P233
  • [4] Research on a Quantitative Regression Model of the Infrared Spectrum Based on the Integrated Learning Algorithm
    Jiang Wei-wei
    Lu Chang-hua
    Zhang Yu-jun
    Ju Wei
    Wang Ji-zhou
    Ou Chun-sheng
    Xiao Ming-xia
    [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41 (04) : 1119 - 1124
  • [5] LI Y-hang, 2014, COMPUTERIZED TOMOGRA, V23, P1051
  • [6] Liu Cuiling, 2014, Journal of Food Science and Technology, V32, P74, DOI 10.3969/j.issn.2095-6002.2014.05.014
  • [7] Semi-supervised evolutionary ensembles for Web video categorization
    Mahmood, Amjad
    Li, Tianrui
    Yang, Yan
    Wang, Hongjun
    Afzal, Mehtab
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 76 : 53 - 66
  • [8] A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification
    Nag, Kaustuv
    Pal, Nikhil R.
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 499 - 510
  • [9] [彭海根 Peng Haigen], 2020, [分析测试学报, Journal of Instrumental Analysis], V39, P1305
  • [10] Qin Yuhua, 2014, Tobacco Science & Technology, P64