Comparative Evaluation of Oversampling Techniques for Balancing Metabolic Profiles

被引:0
|
作者
Hernandez, Mikel [1 ]
Epelde, Gorka [1 ,2 ]
Gil-Redondo, Ruben [3 ]
Embade, Nieves [3 ]
Alberdi, Ane [4 ]
Macia, Ivan [1 ,2 ]
Millet, Oscar [3 ]
机构
[1] Vicomtech Fdn, Digital Hlth & Biomed Technol, Donostia San Sebastian 20009, Spain
[2] Biodonostia Hlth Res Inst, EHealth Grp, Donostia San Sebastian 20014, Spain
[3] CIC bioGUNE, Precis Med & Metab Lab, Derio 48160, Spain
[4] Mondragon Univ, Biomed Engn Dept, Arrasate Mondragon 20500, Spain
关键词
data imbalance; oversampling; synthetic data; metabolomics; data quality; metabolic syndrome; SMOTE;
D O I
10.1145/3569192.3569200
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The problem of imbalanced data is common when applying data analytics paradigms to binary and multiclass data, such as statistical analyses, predictive models, and classification metrics sensitive to imbalanced data, i.e., accuracy. Although there exist some pre-processing, algorithms, and hybrid approaches, none of them has a special focus on balancing metabolic profiles for Metabolic Syndrome analysis. Since the insights and conclusions obtained from data analysis paradigms applied to metabolic data are relevant to the topic, statistical power may be lost due to an imbalance between the Metabolic Syndrome related subclasses. Thus, there is a need to balance metabolic data to improve the insights derived from these types of analyses. In this context, this paper presents a comparative evaluation of six oversampling techniques for balancing metabolic profiles (SMOTE, B-SMOTE, ADASYN, ROS, K-SMOTE, and SVM-SMOTE). An imbalanced dataset with 16 classes from the combinations of 4 binary metabolic conditions is used for this analysis. Additionally, a methodology is defined to objectively evaluate and compare the six oversampling techniques in terms of representativity and variety. The results have shown that ROS and SMOTE have been the best oversampling techniques to balance metabolic data, generating high-quality synthetic profiles that resemble the real ones while balancing all classes equally. This demonstrates that metabolomics studies focused on metabolic syndrome can trust in these oversampling methods to improve their conclusions.
引用
收藏
页码:41 / 47
页数:7
相关论文
共 50 条
  • [1] Evaluation of oversampling data balancing techniques in the context of ordinal classification
    Domingues, Ines
    Amorim, Jose P.
    Abreu, Pedro H.
    Duarte, Hugo
    Santos, Joao
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [2] A COMPARATIVE-EVALUATION OF HEURISTIC LINE BALANCING TECHNIQUES
    TALBOT, FB
    PATTERSON, JH
    GEHRLEIN, WV
    [J]. MANAGEMENT SCIENCE, 1986, 32 (04) : 430 - 454
  • [3] Oversampling Techniques for Diabetes Classification: a Comparative Study
    Mesquita, Francisco
    Mauricio, Jose
    Marques, Goncalo
    [J]. 2021 INTERNATIONAL CONFERENCE ON E-HEALTH AND BIOENGINEERING (EHB 2021), 9TH EDITION, 2021,
  • [4] EXPERIMENTAL INVESTIGATION AND COMPARATIVE EVALUATION OF PRODUCTION LINE BALANCING TECHNIQUES
    MASTOR, AA
    [J]. MANAGEMENT SCIENCE SERIES A-THEORY, 1970, 16 (11): : 728 - 746
  • [5] Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection
    Shar, Lwin Khin
    Duong, Ta Nguyen Binh
    Lo, David
    [J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021), 2021, : 349 - 359
  • [6] Evaluation of performance enhancement in Ethereum fraud detection using oversampling techniques
    Ravindranath, Vaishali
    Nallakaruppan, M. K.
    Shri, M. Lawanya
    Balusamy, Balamurugan
    Bhattacharyya, Siddhartha
    [J]. APPLIED SOFT COMPUTING, 2024, 161
  • [7] Advanced Oversampling Techniques for the SpaceFibre
    Goussev, Vladimir
    Skok, Dmitri
    Maksimovskij, Mikhail
    Solokhina, Tatiana
    Petrichkovich, Jaroslav
    [J]. Proceedings of the 2014 6th International SpaceWire Conference (SpaceWire), 2014,
  • [8] On the techniques of clock extraction and oversampling
    Braunisch, H
    Nair, R
    [J]. HOT INTERCONNECTS 9, 2001, : 139 - 143
  • [9] Comparative studies of load balancing with control and optimization techniques
    Diao, YX
    Wu, CW
    Hellerstein, JL
    Storm, AJ
    Surendra, M
    Lightstone, S
    Parekh, S
    Garcia-Arellano, C
    Carroll, M
    Chu, L
    Colaco, J
    [J]. ACC: Proceedings of the 2005 American Control Conference, Vols 1-7, 2005, : 1484 - 1490
  • [10] A Comparative Analysis of Balancing Techniques and Attribute Reduction Algorithms
    Romero, R.
    Iglesias, E. L.
    Borrajo, L.
    [J]. 6TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2012, 154 : 87 - 94