Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

被引：13

作者：

Nieto-del-Amor, Felix ^{[1
]}

Prats-Boluda, Gema ^{[1
]}

Garcia-Casado, Javier ^{[1
]}

Diaz-Martinez, Alba ^{[1
]}

Jose Diago-Almela, Vicente ^{[2
]}

Monfort-Ortiz, Rogelio ^{[2
]}

Hao, Dongmei ^{[3
]}

Ye-Lin, Yiyao ^{[1
]}

机构：

[1] Univ Politecn Valencia, Ctr Invest & Innovac Bioingn, E-46022 Valencia, Spain

[2] HUP La Fe, Serv Obstet, Valencia 46026, Spain

[3] Beijing Univ Technol, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China

来源：

SENSORS | 2022年 / 22卷 / 14期

关键词：

genetic algorithm; imbalance data learning; electrohysterography; preterm labor prediction; resampling methods; uterine electromyography; machine learning; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; ALGORITHM; ACCURACY; LABOR; TERM; SETS;

D O I：

10.3390/s22145098

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 +/- 4.6%, average precision of 84.5 +/- 11.7%, maximum F1-score of 79.6 +/- 13.8%, and recall of 89.8 +/- 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.

引用

页数：18

共 50 条

[1] Assessment of Dispersion and Bubble Entropy Measures for Enhancing Preterm Birth Prediction Based on Electrohysterographic Signals
Nieto-del-Amor, Felix
Beskhani, Raja
Ye-Lin, Yiyao
Garcia-Casado, Javier
Diaz-Martinez, Alba
Monfort-Ortiz, Rogelio
Jose Diago-Almela, Vicente
Hao, Dongmei
Prats-Boluda, Gema
SENSORS, 2021, 21 (18)
[2] An empirical study on the joint impact of feature selection and data resampling on imbalance classification
Zhang, Chongsheng
Soda, Paolo
Bi, Jingjun
Fan, Gaojuan
Almpanidis, George
Garcia, Salvador
Ding, Weiping
APPLIED INTELLIGENCE, 2023, 53 (05) : 5449 - 5461
[3] An empirical study on the joint impact of feature selection and data resampling on imbalance classification
Chongsheng Zhang
Paolo Soda
Jingjun Bi
Gaojuan Fan
George Almpanidis
Salvador García
Weiping Ding
Applied Intelligence, 2023, 53 : 5449 - 5461
[4] Correction to: An empirical study on the joint impact of feature selection and data resampling on imbalance classification
Chongsheng Zhang
Paolo Soda
Jingjun Bi
Gaojuan Fan
George Almpanidis
Salvador García
Weiping Ding
Applied Intelligence, 2023, 53 : 8506 - 8506
[5] Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data
Patil, Abhijeet R.
Kim, Sangjin
MATHEMATICS, 2020, 8 (01)
[6] An Approach Based on Resampling and Feature Selection to Improve the Classification of Microarray Data
Soleymani, Nafiseh
Moattar, Mohammad Hussein
2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 61 - 64
[7] Novel Approach to Predict Hospital Readmissions Using Feature Selection from Unstructured Data with Class Imbalance
Sundararaman, Arun
Ramanathan, Srinivasan Valady
Thati, Ramprasad
BIG DATA RESEARCH, 2018, 13 : 65 - 75
[8] Dealing with the Data Imbalance Problem in Pulsar Candidate Sifting Based on Feature Selection
Lin, Haitao
Li, Xiangru
RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2024, 24 (02)
[9] Dealing with the Data Imbalance Problem in Pulsar Candidate Sifting Based on Feature Selection
Haitao Lin
Xiangru Li
Research in Astronomy and Astrophysics, 2024, 24 (02) : 127 - 139
[10] Combination of Feature Selection Methods for the Effective Classification of Microarray Gene Expression Data
Sheela, T.
Rangarajan, Lalitha
RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 137 - 145

← 1 2 3 4 5 →