Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data

被引:0
|
作者
Serpil Yalcin Kuzu
机构
[1] Firat University,Department of Physics, Faculty of Science
来源
关键词
Imbalanced dataset; Multiclass classification; Random forest classifier; Resampling; Upsilon states; Weighted random forest classifier; 68T05; 68T45;
D O I
暂无
中图分类号
学科分类号
摘要
Data used in particle physics analyses have an imbalanced nature in which the events of interest are rare due to the broad background. These events can be identified from bulk by intensive computational studies including application of sophisticated analysis techniques. Classification algorithms provided by supervised machine learning (ML) approaches can be utilized to interpret skewed particle dataset as an alternative to the classic techniques even for multi particle state analysis. In this study, the ground state of the bottomonium (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(1 S)) and its excited states (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(2 S) and Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(3 S)) were studied by application of multiclass classification approach based on random forest classifier (RFC) which is a novel ML approach example in particle analysis with implementation of resampling techniques for preprocessing dataset and modification of the weighting strategy. For this purpose, five widely used oversampling and two hybrid strategies, using over and under resampling together, were adjusted to RFC. Moreover, class weights applied RFC, weighted random forest (WRF), was used in the analysis. Due to the data structure, performance of the applied models was evaluated by the derivatives of confusion matrix. It is revealed that hybrid techniques implemented in RFC is suitable for handling highly imbalanced classes. G-mean and BAcc scores of upsilon states presented that with SMOTETomek strategy the model exhibited highest classification achievement, around 90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, with high sensitivity implying the success of the application on multiclass classification.
引用
收藏
相关论文
共 50 条
  • [41] MODIFICATION OF RANDOM FOREST BASED APPROACH FOR STREAMING DATA WITH CONCEPT DRIFT
    Zhukov, A. V.
    Sidorov, D. N.
    BULLETIN OF THE SOUTH URAL STATE UNIVERSITY SERIES-MATHEMATICAL MODELLING PROGRAMMING & COMPUTER SOFTWARE, 2016, 9 (04): : 86 - 95
  • [42] Tumor classification from gene expression data:: A coding-based multiclass learning approach
    Hüntemann, A
    González, JC
    Tapia, E
    BIOLOGICAL AND MEDICAL DATA ANALYSIS, PROCEEDINGS, 2005, 3745 : 211 - 222
  • [43] Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data
    Fontes, Juliana de Abreu
    Anzanello, Michel Jose
    Brito, Joao B. G.
    Bucco, Guilherme Brandelli
    Fogliatto, Flavio Sanson
    Puglia, Fabio do Prado
    FORENSIC SCIENCE INTERNATIONAL, 2021, 328
  • [44] Random forest for gene selection and microarray data classification
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    BIOINFORMATION, 2011, 7 (03) : 142 - 146
  • [45] Investigation of the random forest framework for classification of hyperspectral data
    Ham, J
    Chen, YC
    Crawford, MM
    Ghosh, J
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2005, 43 (03): : 492 - 501
  • [46] Random Forest for Gene Selection and Microarray Data Classification
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    KNOWLEDGE TECHNOLOGY, 2012, 295 : 174 - 183
  • [47] Application of Data Denoising and Classification Algorithm Based on RPCA and Multigroup Random Walk Random Forest in Engineering
    Wang, Renchao
    Wang, Yanlei
    Ma, Yuming
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [48] Random forest based classification of seagrass habitat
    Upadhyay, Anand
    Singh, Ratan
    Dhonde, Omkar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (02): : 613 - 620
  • [49] Block Ciphers Classification Based on Random Forest
    Hu, Xinyi
    Zhao, Yaqun
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
  • [50] A Genetic Programming approach for feature selection in highly dimensional skewed data
    Viegas, Felipe
    Rocha, Leonardo
    Goncalves, Marcos
    Mourao, Fernando
    Sa, Giovanni
    Salles, Thiago
    Andrade, Guilherme
    Sandin, Isac
    NEUROCOMPUTING, 2018, 273 : 554 - 569