Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

被引:13
|
作者
Nieto-del-Amor, Felix [1 ]
Prats-Boluda, Gema [1 ]
Garcia-Casado, Javier [1 ]
Diaz-Martinez, Alba [1 ]
Jose Diago-Almela, Vicente [2 ]
Monfort-Ortiz, Rogelio [2 ]
Hao, Dongmei [3 ]
Ye-Lin, Yiyao [1 ]
机构
[1] Univ Politecn Valencia, Ctr Invest & Innovac Bioingn, E-46022 Valencia, Spain
[2] HUP La Fe, Serv Obstet, Valencia 46026, Spain
[3] Beijing Univ Technol, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China
关键词
genetic algorithm; imbalance data learning; electrohysterography; preterm labor prediction; resampling methods; uterine electromyography; machine learning; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; ALGORITHM; ACCURACY; LABOR; TERM; SETS;
D O I
10.3390/s22145098
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 +/- 4.6%, average precision of 84.5 +/- 11.7%, maximum F1-score of 79.6 +/- 13.8%, and recall of 89.8 +/- 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Developing Decision Tree based Models in Combination with Filter Feature Selection Methods for Direct Marketing
    Obiedat, Ruba
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 650 - 659
  • [32] An Efficient Approach to Predict Eye Diseases from Symptoms Using Machine Learning and Ranker-Based Feature Selection Methods
    Al Marouf, Ahmed
    Mottalib, Md Mozaharul
    Alhajj, Reda
    Rokne, Jon
    Jafarullah, Omar
    BIOENGINEERING-BASEL, 2023, 10 (01):
  • [33] Using GA-based Feature Selection for Emotion Recognition from Physiological Signals
    Gu, Y.
    Tan, S. L.
    Wong, K. J.
    Ho, M. H. R.
    Qu, L.
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 70 - +
  • [34] Lightweight Feature Selection Methods Based on Standardized Measure of Dispersion for Mining Big Data
    Fong, Simon
    Biuk-Aghai, Robert P.
    Si, Yain-Whar
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 553 - 559
  • [35] New fast feature selection methods based on multiple support vector data description
    Li Zhang
    Xingning Lu
    Applied Intelligence, 2018, 48 : 1776 - 1790
  • [36] New fast feature selection methods based on multiple support vector data description
    Zhang, Li
    Lu, Xingning
    APPLIED INTELLIGENCE, 2018, 48 (07) : 1776 - 1790
  • [37] Centralized vs. distributed feature selection methods based on data complexity measures
    Moran-Fernandez, L.
    Bolon-Canedo, V.
    Alonso-Betanzos, A.
    KNOWLEDGE-BASED SYSTEMS, 2017, 117 : 27 - 45
  • [38] Feature selection from microarray data : Genetic algorithm based approach
    Ram, Pintu Kumar
    Kuila, Pratyay
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (08): : 1599 - 1610
  • [39] A new approach for gender detection from voice data: Feature selection with optimization methods
    Ozbay, Feyza Altunbey
    Ozbay, Erdal
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2023, 38 (02): : 1179 - 1192
  • [40] CIS feature selection based dynamic ensemble selection model for human stress detection from EEG signals
    Malviya, Lokesh
    Mal, Sandip
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (04): : 2367 - 2381