Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data

被引:11
|
作者
Fontes, Juliana de Abreu [1 ]
Anzanello, Michel Jose [1 ]
Brito, Joao B. G. [1 ]
Bucco, Guilherme Brandelli [2 ]
Fogliatto, Flavio Sanson [1 ]
Puglia, Fabio do Prado [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Dept Engn Prod & Transportes, Av Osvaldo Aranha,99-5 Andar, Porto Alegre, RS, Brazil
[2] Univ Fed Rio Grande do Sul, Escola Adm, Washington Luiz 855, Porto Alegre, RS, Brazil
关键词
Random Forest classifier; Chi-Squared; Spectroscopy; Wavelength selection; WAVE-NUMBER SELECTION; INFRARED-SPECTROSCOPY; FTIR SPECTROSCOPY; COUNTERFEIT; COCAINE; ADULTERATION; MEDICINES; SAMPLES; FOOD; QUANTIFICATION;
D O I
10.1016/j.forsciint.2021.110998
中图分类号
DF [法律]; D9 [法律]; R [医药、卫生];
学科分类号
0301 ; 10 ;
摘要
Near Infrared (NIR) is a type of vibrational spectroscopy widely used in different areas to characterize substances. NIR datasets are comprised of absorbance measures on a range of wavelengths (lambda). Typically noisy and correlated, the use of such datasets tend to compromise the performance of several statistical techniques; one way to overcome that is to select portions of the spectra in which wavelengths are more informative. In this paper we investigate the performance of the Random Forest (RF) classifier associated with several wavelength importance ranking approaches on the task of classifying product samples into categories, such as quality levels or authenticity. Our propositions are tested using six NIR datasets comprised of two or more classes of food and pharmaceutical products, as well as illegal drugs. Our proposed classification model, an integration of the chi(2) ranking score and the RF classifier, substantially reduced the number of wavelengths in the dataset, while increasing the classification accuracy when compared to the use of complete datasets. Our propositions also presented good performance when compared to competing methods available in the literature. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] An effective approach for improving the accuracy of a random forest classifier in the classification of Hyperion data
    Chutia, Dibyajyoti
    Borah, Naiwrita
    Baruah, Diganta
    Bhattacharyya, Dhruba Kumar
    Raju, P. L. N.
    Sarma, K. K.
    [J]. APPLIED GEOMATICS, 2020, 12 (01) : 95 - 105
  • [22] Random forest Algorithm for the Classification of Spectral Data of Astronomical Objects
    Solorio-Ramirez, Jose-Luis
    Jimenez-Cruz, Raul
    Villuendas-Rey, Yenny
    Yanez-Marquez, Cornelio
    [J]. ALGORITHMS, 2023, 16 (06)
  • [23] A random forest classifier based on pixel comparison features for urban LiDAR data
    Wang, Chisheng
    Shu, Qiqi
    Wang, Xinyu
    Guo, Bo
    Liu, Peng
    Li, Qingquan
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 148 : 75 - 86
  • [24] Feature Importance Ranking of Random Forest-Based End-to-End Learning Algorithm
    Yuan, Xiaoguang
    Liu, Shiruo
    Feng, Wei
    Dauphin, Gabriel
    [J]. REMOTE SENSING, 2023, 15 (21)
  • [25] RANKING METHODOLOGY FOR SEQUENTIAL BAND SELECTION COMBINING DATA DISPERSION AND SPECTRAL BAND CORRELATION
    Llaveria, David
    Camps, Adriano
    Park, Hyuk
    Narayan, Ram
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 775 - 778
  • [26] Image Classification Using RapidEye Data: Integration of Spectral and Textual Features in a Random Forest Classifier (vol 10, pg 5334, 2017)
    Zhang, Huanxue
    Li, Qiangzi
    Liu, Jiangui
    Shang, Jiali
    Du, Xin
    McNairn, Heather
    Champagne, Catherine
    Dong, Taifeng
    Liu, Mingxu
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2018, 11 (07) : 2571 - 2571
  • [27] Combining random forest with multi-block local binary pattern feature selection for multiclass head pose estimation
    Kang, Min-Joo
    Lee, Jung-Kyung
    Kang, Je-Won
    [J]. PLOS ONE, 2017, 12 (07):
  • [28] Gene Selection and Classification Approach for Microarray Data based on Random Forest Ranking and BBHA
    Pashaei, Elnaz
    Ozen, Mustafa
    Aydin, Nizamettin
    [J]. 2016 3RD IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, 2016, : 308 - 311
  • [29] Modifying Cleaning Method in Big Data Analytics Process using Random Forest Classifier
    Hossen, J.
    Jesmeen, M. Z. H.
    Sayeed, Shohel
    [J]. PROCEEDINGS OF THE 2018 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE), 2018, : 208 - 213
  • [30] Event recognition in marine seismological data using Random Forest machine learning classifier
    Domel, Przemyslaw
    Hibert, Clement
    Schlindwein, Vera
    Plaza-Faverola, Andreia
    [J]. GEOPHYSICAL JOURNAL INTERNATIONAL, 2023, 235 (01) : 589 - 609