An automatic generation of pre-processing strategy combined with machine learning multivariate analysis for NIR spectral data

被引:8
|
作者
Arianti, Nunik Destria [1 ]
Saputra, Edo [2 ,3 ]
Sitorus, Agustami [4 ,5 ]
机构
[1] Nusa Putra Univ, Dept Informat Syst, Sukabumi 43155, Indonesia
[2] Univ Riau, Fac Agr, Dept Agr Technol, Pekanbaru 28293, Indonesia
[3] IPB Univ, Agr Engn Study Program, Bogor 16680, Indonesia
[4] Natl Res & Innovat Agcy BRIN, Res Ctr Appropriate Technol, Subang 41213, Indonesia
[5] King Mongkuts Inst Technol Ladkrabang, Sch Engn, Dept Agr Engn, Bangkok 10520, Thailand
关键词
Ensemble pre-processing; Chemometrics; Machine learning; AGoES;
D O I
10.1016/j.jafr.2023.100625
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Pre-processing near-infrared (NIR) spectral data is indispensable in multivariate analysis, since the measured spectra of complex samples are often subject to overwhelming background, light scattering, varying noises, and other unexpected factors. Various pre-processing methods have been developed to remove or reduce the interference of these effects. Until now, most applications of NIR spectra pre-processing in multivariate calibration have been trial-and-error, with selecting a proper method depending on the nature of the data, expertise, and practitioner experience. Thus, it is usually challenging to determine the best pre-processing method for a given data. In order to tackle these problems, this study proposes a new concept of data pre-processing, namely, automatically generating a pre-processing strategy (AGoES). This concept belongs to the ensemble pre-processing method, where machine learning algorithms (PLSR, SVM, k-NN, DT, AB, and GPR) built on differently preprocessed data are combined by 5-fold cross-validation and grid search optimization. To investigate our concept, a public NIR spectral dataset was used to predict three responses, including dry matter content (DM), organic matter content (OM) and ammonium nitrogen content (AN) from manure organic waste. The results show that SVM is the best algorithm combined with the AGoES pre-processing to predict DM and AN with a ratio of prediction to deviation (RPD) of 3.619 and 2.996, respectively. The AB tandem with AGoES pre-processing is the best strategy for predicting OM with an RPD of 3.185. Therefore, in the framework of the AGoES concept, it is unsupervised pre-processing, more simple, and feasible to apply multivariate analysis using machine learning algorithms.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Pre-processing of inkjet prints NIR spectral data for principal component analysis
    Oravec, Michal
    Gal, Lukas
    Ceppan, Michal
    ACTA CHIMICA SLOVACA, 2015, 8 (02): : 191 - 196
  • [2] Optimizing Machine Learning Data Pre-Processing for Financial Fraud Detection
    Bower, Matthew
    Godasu, Rajesh
    Nyakundi, Nicholas
    Reynolds, Shawn
    2024 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY, EIT 2024, 2024, : 28 - 37
  • [3] DATA PRE-PROCESSING APPROACHES IN PREDICTIVE MACHINE LEARNING OBSERVATIONAL STUDIES
    Friedman, H. S.
    Navaratnam, P.
    Kakehi, S.
    Ray, S.
    Hill, N.
    Kim, I
    Gricar, J.
    VALUE IN HEALTH, 2023, 26 (06) : S284 - S284
  • [4] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abello, Alberto
    INFORMATION SYSTEMS, 2022, 108
  • [5] Investigation of NIR spectra pre-processing methods combined with multivariate regression for determination of moisture in powdered industrial egg
    Watanabe, Lycio Shinji
    Bovolenta, Yuri Renan
    Acquaro Junior, Vinicius Ricardo
    Barbin, Douglas Fernandes
    Madeira, Tiago Bervelieri
    Nixdorf, Suzana Lucy
    ACTA SCIENTIARUM-TECHNOLOGY, 2018, 40
  • [6] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abelló, Alberto
    Information Systems, 2022, 108
  • [7] Comparative Study of Machine Learning Techniques for Pre-processing of Network Intrusion Data
    Rahat, Faiza
    Ahsan, Syed Nadeem
    2015 INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS & TECHNOLOGIES (ICOSST), 2015, : 46 - 51
  • [8] Package Proposal for Data Pre-Processing for Machine Learning Applied to Precision Irrigation
    dos Santos, Rogerio Pereira
    Beko, Marko
    Leithardt, Valderi R. Q.
    2023 6TH CONFERENCE ON CLOUD AND INTERNET OF THINGS, CIOT, 2023, : 141 - 148
  • [9] Review of Data Pre-processing Techniques and Machine Learning in PTR-MS
    Sun Y.
    Chen Y.-B.
    Chu M.-J.
    Jiang X.-H.
    Wang Y.
    Guo B.-Q.
    2018, Chinese Society for Mass Spectrometry (39) : 513 - 523
  • [10] Exploring the Steps of Infrared (IR) Spectral Analysis: Pre-Processing, (Classical) Data Modelling, and Deep Learning
    Mokari, Azadeh
    Guo, Shuxia
    Bocklitz, Thomas
    MOLECULES, 2023, 28 (19):