Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

被引:13
|
作者
Nieto-del-Amor, Felix [1 ]
Prats-Boluda, Gema [1 ]
Garcia-Casado, Javier [1 ]
Diaz-Martinez, Alba [1 ]
Jose Diago-Almela, Vicente [2 ]
Monfort-Ortiz, Rogelio [2 ]
Hao, Dongmei [3 ]
Ye-Lin, Yiyao [1 ]
机构
[1] Univ Politecn Valencia, Ctr Invest & Innovac Bioingn, E-46022 Valencia, Spain
[2] HUP La Fe, Serv Obstet, Valencia 46026, Spain
[3] Beijing Univ Technol, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China
关键词
genetic algorithm; imbalance data learning; electrohysterography; preterm labor prediction; resampling methods; uterine electromyography; machine learning; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; ALGORITHM; ACCURACY; LABOR; TERM; SETS;
D O I
10.3390/s22145098
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 +/- 4.6%, average precision of 84.5 +/- 11.7%, maximum F1-score of 79.6 +/- 13.8%, and recall of 89.8 +/- 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] CIS feature selection based dynamic ensemble selection model for human stress detection from EEG signals
    Lokesh Malviya
    Sandip Mal
    Cluster Computing, 2023, 26 : 2367 - 2381
  • [42] Heart arrhythmia diagnosis based on the combination of morphological, frequency and nonlinear features of ECG signals and metaheuristic feature selection algorithm
    Mazaheri, Vajihe
    Khodadadi, Hamed
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161
  • [43] Feature-based Analysis of Gait Signals for Biometric Recognition Automatic Extraction and Selection of Features from Accelerometer Signals
    De Marsico, Maria
    Fartade, Eduard Gabriel
    Mecca, Alessio
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2018), 2018, : 630 - 637
  • [44] Filter-based feature selection methods in the presence of missing data for medical prediction models
    Aydin, Zeliha Ergul
    Ozturk, Zehra Kamisli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 24187 - 24216
  • [45] A review of random forest-based feature selection methods for data science education and applications
    Iranzad, Reza
    Liu, Xiao
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [46] Filter-based feature selection methods in the presence of missing data for medical prediction models
    Zeliha Ergul Aydin
    Zehra Kamisli Ozturk
    Multimedia Tools and Applications, 2024, 83 : 24187 - 24216
  • [47] Enhancing classification of preterm-term birth using continuous wavelet transform and entropy-based methods of electrohysterogram signals
    Romero-Morales, Hector
    Munoz-Montes de Oca, Jenny Noemi
    Mora-Martinez, Rodrigo
    Mina-Paz, Yecid
    Javier Reyes-Lagos, Jose
    FRONTIERS IN ENDOCRINOLOGY, 2023, 13
  • [48] A New Approach for Feature Selection from Microarray Data Based on Mutual Information
    Tang, Jian
    Zhou, Shuigeng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (06) : 1004 - 1015
  • [49] Predicting Low Cognitive Ability at Age 5-Feature Selection Using Machine Learning Methods and Birth Cohort Data
    Bowe, Andrea K.
    Lightbody, Gordon
    Staines, Anthony
    Kiely, Mairead E.
    McCarthy, Fergus P.
    Murray, Deirdre M.
    INTERNATIONAL JOURNAL OF PUBLIC HEALTH, 2022, 67
  • [50] LEEP CONISATION AND THE RISK FOR PRETERM BIRTH: NEW HEALTH REGISTRY BASED DATA FROM FINLAND
    Paavonen, J.
    Heinonen, A.
    Gissler, M.
    Tapper, A. M.
    Jakobsson, M.
    SEXUALLY TRANSMITTED INFECTIONS, 2011, 87 : A357 - A357