Prediction of Oil Content in Oil Shale by Near-Infrared Spectroscopy Based on Stacking Ensemble Learning

被引:5
|
作者
Li, Quan-lun [1 ]
Chen, Zheng-guang [1 ]
Jiao, Feng [2 ]
机构
[1] Heilongjiang Bayi Agr Univ, Coll Informat & Elect Engn, Daqing 163319, Peoples R China
[2] Heilongjiang Bayi Agr Univ, Coll Agr, Daqing 163319, Peoples R China
关键词
Near-infrared; Integrated learning; Oil content of oil shale; Characteristic wavelength; Random Forest feature selection; YIELD;
D O I
10.3964/j.issn.1000-0593(2023)04-1030-07
中图分类号
O433 [光谱学];
学科分类号
0703 ; 070302 ;
摘要
Aims to overcome the shortcomings that the prediction accuracy of a single model is hard to improve further, A heterogeneous ensemble learning model based on the Stacking framework, combined with near-infrared spectroscopy analysis technology, was adopted to detect the oil content in oil shale in this study. A total of 230 oil shale core samples, collected from some block in Songliao Basin, were taken as the research object, whose oil content was measured by the low-temperature dry distillation method, and near-infrared spectral data corresponding to each sample was scanned simultaneously. The Monte Carlo algorithm was employed to eliminate outlier samples, and 213 samples, after removing outliers, were randomly divided into a training set and test set according to the ratio of 3:1. The detrend coupled with the baseline correction method was used to eliminate the influence of noise and baseline drift in spectral data. After that, the random forest algorithm (RF) was used to extract the characteristic wavelength according to the importance of wavelength. In order to further reduce the data dimension, the CARS algorithm was used to extract the characteristic wavelength. Finally, PLS, SVM, RF and GBDT, whose parameters were optimized by grid search, were adopted as primary learners, and the PLS regression modelwas adopted as secondary learners to build the stacking ensemble learning model. The accuracy of the single and ensemble learning models for oil shale oil content prediction was compared under evaluation indicators of R-2 and RMSE. The research results show that the RF-CASR method can effectively screen important wavelengths and improve the efficiency of the model, thereby improving the model efficiency. Heterogeneous integrated learning models based on Stacking have better predictive performance and greater stability than single models (SVM, PLS) and homogeneous integrated learning models (RF, GBDT). Based on multiple random divisions of the data set, the average R-2 of the Stacking ensemble learning model is 0.894 2, an average increase of 0.062 3 compared with other models; the RMSEP of 0.5869 is an average of 0.147 4 lower than other models. The results of this study show that the heterogeneous integrated learning model based on stacking can combine the advantages of primary learners to predict the oil content of oil shale quickly and accurately, which provides a new fast and portable method for oil shale oil content detection.
引用
收藏
页码:1030 / 1036
页数:7
相关论文
共 14 条
  • [11] Near infrared prediction of oil yield from oil shale
    Romeo, MJ
    Adams, MJ
    Hind, AR
    Bhargava, SK
    Grocott, SC
    [J]. JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2002, 10 (03) : 223 - 231
  • [12] Rapid detection of oil yield of oil shale by combination of wavelengths in near infrared spectroscopy
    College of Instrument Science & Electrical Engineering, Jilin University, Changchun
    130026, China
    [J]. Guangxue Jingmi Gongcheng, 2 (371-377):
  • [13] STACKED GENERALIZATION
    WOLPERT, DH
    [J]. NEURAL NETWORKS, 1992, 5 (02) : 241 - 259
  • [14] Analysis of Oil Yield from Oil Shale Minerals Based on Near-infrared Spectroscopy with Least Squares Support Vector Machines
    Zhang Fudong
    Liu Jie
    Wang Zhihong
    [J]. CHEMICAL JOURNAL OF CHINESE UNIVERSITIES-CHINESE, 2016, 37 (10): : 1792 - 1798