Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data

被引:0
|
作者
Serpil Yalcin Kuzu
机构
[1] Firat University,Department of Physics, Faculty of Science
来源
关键词
Imbalanced dataset; Multiclass classification; Random forest classifier; Resampling; Upsilon states; Weighted random forest classifier; 68T05; 68T45;
D O I
暂无
中图分类号
学科分类号
摘要
Data used in particle physics analyses have an imbalanced nature in which the events of interest are rare due to the broad background. These events can be identified from bulk by intensive computational studies including application of sophisticated analysis techniques. Classification algorithms provided by supervised machine learning (ML) approaches can be utilized to interpret skewed particle dataset as an alternative to the classic techniques even for multi particle state analysis. In this study, the ground state of the bottomonium (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(1 S)) and its excited states (Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(2 S) and Υ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varUpsilon $$\end{document}(3 S)) were studied by application of multiclass classification approach based on random forest classifier (RFC) which is a novel ML approach example in particle analysis with implementation of resampling techniques for preprocessing dataset and modification of the weighting strategy. For this purpose, five widely used oversampling and two hybrid strategies, using over and under resampling together, were adjusted to RFC. Moreover, class weights applied RFC, weighted random forest (WRF), was used in the analysis. Due to the data structure, performance of the applied models was evaluated by the derivatives of confusion matrix. It is revealed that hybrid techniques implemented in RFC is suitable for handling highly imbalanced classes. G-mean and BAcc scores of upsilon states presented that with SMOTETomek strategy the model exhibited highest classification achievement, around 90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, with high sensitivity implying the success of the application on multiclass classification.
引用
收藏
相关论文
共 50 条
  • [31] Multiclass genetic programming based approach for classification of intrusions
    Gp, Sunitha
    D'Souza, Rio
    PROCEEDINGS OF THE 2017 3RD INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2017, : 74 - 78
  • [32] A GP Based Approach to the Classification of Multiclass Microarray Datasets
    Xu, Chun-Gui
    Liu, Kun-Hong
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2008, 5227 : 340 - 346
  • [33] Multiclass microarray data classification based on confidence evaluation
    Yu, H. L.
    Gao, S.
    Qin, B.
    Zhao, J.
    GENETICS AND MOLECULAR RESEARCH, 2012, 11 (02) : 1357 - 1369
  • [35] IDRF: An Improved Dynamic Random Forest Approach for Blockchain Time Series Data Classification
    Alsayyad, Ahmed Faris
    Mabrouk, Mohamed
    Al-Shammari, Ahmed
    Zrigui, Mounir
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT I, ACIIDS 2024, 2024, 14795 : 85 - 96
  • [36] Object-based classification of hyperspectral data using Random Forest algorithm
    Amini, Saeid
    Homayouni, Saeid
    Safari, Abdolreza
    Darvishsefat, Ali A.
    GEO-SPATIAL INFORMATION SCIENCE, 2018, 21 (02) : 127 - 138
  • [37] Random Forest-Based Manifold Learning for Classification of Imaging Data in Dementia
    Gray, Katherine R.
    Aljabar, Paul
    Heckemann, Rolf A.
    Hammers, Alexander
    Rueckert, Daniel
    MACHINE LEARNING IN MEDICAL IMAGING, 2011, 7009 : 159 - +
  • [38] A method of classification for airborne full waveform LiDAR data based on random forest
    Zhang, A. (zhangaw98@163.com), 1600, Science Press (40):
  • [39] Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
    Liu, Wuying
    Wang, Lin
    Yi, Mianzhu
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [40] AI federated learning based improvised random Forest classifier with error reduction mechanism for skewed data sets
    More, Anjali
    Rana, Dipti
    INTERNATIONAL JOURNAL OF PERVASIVE COMPUTING AND COMMUNICATIONS, 2022,