Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data

被引:11
|
作者
Fontes, Juliana de Abreu [1 ]
Anzanello, Michel Jose [1 ]
Brito, Joao B. G. [1 ]
Bucco, Guilherme Brandelli [2 ]
Fogliatto, Flavio Sanson [1 ]
Puglia, Fabio do Prado [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Dept Engn Prod & Transportes, Av Osvaldo Aranha,99-5 Andar, Porto Alegre, RS, Brazil
[2] Univ Fed Rio Grande do Sul, Escola Adm, Washington Luiz 855, Porto Alegre, RS, Brazil
关键词
Random Forest classifier; Chi-Squared; Spectroscopy; Wavelength selection; WAVE-NUMBER SELECTION; INFRARED-SPECTROSCOPY; FTIR SPECTROSCOPY; COUNTERFEIT; COCAINE; ADULTERATION; MEDICINES; SAMPLES; FOOD; QUANTIFICATION;
D O I
10.1016/j.forsciint.2021.110998
中图分类号
DF [法律]; D9 [法律]; R [医药、卫生];
学科分类号
0301 ; 10 ;
摘要
Near Infrared (NIR) is a type of vibrational spectroscopy widely used in different areas to characterize substances. NIR datasets are comprised of absorbance measures on a range of wavelengths (lambda). Typically noisy and correlated, the use of such datasets tend to compromise the performance of several statistical techniques; one way to overcome that is to select portions of the spectra in which wavelengths are more informative. In this paper we investigate the performance of the Random Forest (RF) classifier associated with several wavelength importance ranking approaches on the task of classifying product samples into categories, such as quality levels or authenticity. Our propositions are tested using six NIR datasets comprised of two or more classes of food and pharmaceutical products, as well as illegal drugs. Our proposed classification model, an integration of the chi(2) ranking score and the RF classifier, substantially reduced the number of wavelengths in the dataset, while increasing the classification accuracy when compared to the use of complete datasets. Our propositions also presented good performance when compared to competing methods available in the literature. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Combining Sampling and Ensemble Classifier for Multiclass Imbalance Data Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Adnan, Fairuz
    Ahmad, Faudziah
    [J]. COMPUTATIONAL SCIENCE AND TECHNOLOGY, ICCST 2017, 2018, 488 : 262 - 272
  • [2] Image Classification Using RapidEye Data: Integration of Spectral and Textual Features in a Random Forest Classifier
    Zhang, Huanxue
    Li, Qiangzi
    Liu, Jiangui
    Shang, Jiali
    Du, Xin
    McNairn, Heather
    Champagne, Catherine
    Dong, Taifeng
    Liu, Mingxu
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (12) : 5334 - 5349
  • [3] Variable ranking and selection with random forest for unbalanced data
    Bradter, Ute
    Altringham, John D.
    Kunin, William E.
    Thom, Tim J.
    O'Connell, Jerome
    Benton, Tim G.
    [J]. ENVIRONMENTAL DATA SCIENCE, 2022, 1
  • [4] Improved classification techniques by combining KNN and Random Forest with Naive Bayesian Classifier
    Devi, R. Gayathri
    Sumanjani, P.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICETECH), 2015, : 95 - 98
  • [5] On Robustness of Adaptive Random Forest Classifier on Biomedical Data Stream
    Fatlawi, Hayder K.
    Kiss, Attila
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 332 - 344
  • [6] Random Bits Forest: a Strong Classifier/Regressor for Big Data
    Yi Wang
    Yi Li
    Weilin Pu
    Kathryn Wen
    Yin Yao Shugart
    Momiao Xiong
    Li Jin
    [J]. Scientific Reports, 6
  • [7] Random Bits Forest: a Strong Classifier/Regressor for Big Data
    Wang, Yi
    Li, Yi
    Pu, Weilin
    Wen, Kathryn
    Shugart, Yin Yao
    Xiong, Momiao
    Jin, Li
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [8] Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data
    Kuzu, Serpil Yalcin
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2023, 95 (01)
  • [9] Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data
    Serpil Yalcin Kuzu
    [J]. Journal of Scientific Computing, 2023, 95
  • [10] Prediction of Clinical Disease with AI-Based Multiclass Classification Using Naive Bayes and Random Forest Classifier
    Jackins, V
    Vimal, S.
    Kaliappan, M.
    Lee, Mi Young
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE AND APPLIED COGNITIVE COMPUTING, 2021, : 841 - 849