Employing feature engineering strategies to improve the performance of machine learning algorithms on echocardiogram dataset

被引:0
|
作者
Huang, Huang-Nan [1 ]
Chen, Hong-Ming [1 ]
Lin, Wei-Wen [2 ,3 ,4 ,9 ]
Huang, Chau-Jian [5 ]
Chen, Yung-Cheng [6 ]
Wang, Yu-Huei [2 ]
Yang, Chao-Tung [6 ,7 ,8 ]
机构
[1] Tunghai Univ, Dept Appl Math, Taichung, Taiwan
[2] Taichung Vet Gen Hosp, Cardiovasc Ctr, Taichung, Taiwan
[3] Natl Chung Hsing Univ, Dept PostBaccalaureate Med, Taichung, Taiwan
[4] Tunghai Univ, Dept Life Sci, Taichung, Taiwan
[5] ShuZen Jr Coll Med & Management, Dept Informat Management, Kaohsiung, Taiwan
[6] Tunghai Univ, Dept Comp Sci, Taichung, Taiwan
[7] Tunghai Univ, Res Ctr Smart Sustainable Circular Econ, Taichung, Taiwan
[8] Tunghai Univ, Res Ctr Smart Sustainable Circular Econ, Dept Comp Sci, Taichung 407224, Taiwan
[9] Tunghai Univ, Natl Chung Hsing Univ, Taichung Vet Gen Hosp, Dept Life Sci,Cardiovasc Ctr,Taichung Dept Postbac, Taichung 407224, Taiwan
来源
DIGITAL HEALTH | 2023年 / 9卷
关键词
Precision medicine; feature selection; machine learning; data scrubbing; correlation matrix; ENSEMBLE; MODEL;
D O I
10.1177/20552076231207589
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
ObjectivesThis study mainly uses machine learning (ML) to make predictions by inputting features during training and inference. The method of feature selection is an important factor affecting the accuracy of ML models, and the process includes data extraction, which is the collection of all data required for ML. It also needs to import the concept of feature engineering, namely, this study needs to label the raw data of the cardiac ultrasound dataset with one or more meaningful and informative labels so that the ML model can learn from it and predict more accurate target values. Therefore, this study will enhance the strategies of feature selection methods from the raw dataset, as well as the issue of data scrubbing.MethodsIn this study, the ultrasound dataset was cleaned and critical features were selected through data standardization, normalization, and missing features imputation in the field of feature engineering. The aim of data scrubbing was to retain and select critical features of the echocardiogram dataset while making the prediction of the ML algorithm more accurate.ResultsThis paper mainly utilizes commonly used methods in feature engineering and finally selects four important feature values. With the ML algorithms available on the Azure platform, namely, Random Forest and CatBoost, a Voting Ensemble method is used as the training algorithm, and this study also uses visual tools to gain a clearer understanding of the raw data and to improve the accuracy of the predictive model.ConclusionThis paper emphasizes feature engineering, specifically on the cleaning and analysis of missing values in the raw dataset of echocardiography and the identification of important critical features in the raw dataset. The Azure platform is used to predict patients with a history of heart disease (individuals who have been under surveillance in the past three years and those who haven't). Through data scrubbing and preprocessing methods in feature engineering, the model can more accurately predict the future occurrence of heart disease in patients.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Feature Engineering Algorithms for Traffic Dataset
    Abdullah, Akibu Mahmoud
    Usmani, Raja Sher Afgun
    Pillai, Thulasyammal Ramiah
    Hashem, Ibrahim Abaker Targio
    Marjani, Mohsen
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (04) : 261 - 268
  • [2] Dynamic Feature Dataset for Ransomware Detection Using Machine Learning Algorithms
    Herrera-Silva, Juan A.
    Hernandez-alvarez, Myriam
    [J]. SENSORS, 2023, 23 (03)
  • [3] Snow and glacial feature identification using Hyperion dataset and machine learning algorithms
    Haq M.A.
    Alshehri M.
    Rahaman G.
    Ghosh A.
    Baral P.
    Shekhar C.
    [J]. Arabian Journal of Geosciences, 2021, 14 (15)
  • [4] Diabetes prediction using feature engineering and machine learning algorithms with security
    Arora, Jyoti
    Rathee, Sonia
    Gahlan, Mamta
    Shalu, Amita Yadav
    [J]. JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2024, 27 (02) : 273 - 284
  • [5] An Application of Feature Engineering and Machine Learning Algorithms on Condition Monitoring of SiC Converters
    Toussi, Afshin Loghmani Moghaddam
    Bahman, Amir Sajjad
    Iannuzzo, Francesco
    Blaabjerg, Frede
    [J]. 2021 IEEE ENERGY CONVERSION CONGRESS AND EXPOSITION (ECCE), 2021, : 3652 - 3658
  • [6] A Novel feature reduction method to improve the performance of Machine Learning model
    Mirniaharikandehei, Seyedehnafiseh
    Heidari, Morteza
    Danala, Gopichandh
    Lakshmivarahan, Sivaramakrishnan
    Zheng, Bin
    [J]. MEDICAL IMAGING 2021: COMPUTER-AIDED DIAGNOSIS, 2021, 11597
  • [7] Methodology to develop machine learning algorithms to improve performance in gastrointestinal endoscopy
    Thomas de Lange
    P?l Halvorsen
    Michael Riegler
    [J]. World Journal of Gastroenterology, 2018, 24 (45) : 5057 - 5062
  • [8] Methodology to develop machine learning algorithms to improve performance in gastrointestinal endoscopy
    de lange, Thomas
    Halvorsen, Pal
    Riegler, Michael
    [J]. WORLD JOURNAL OF GASTROENTEROLOGY, 2018, 24 (45) : 5057 - 5062
  • [9] Evaluating the Performance of Machine Learning Sentiment Analysis Algorithms in Software Engineering
    Shen, Jingyi
    Baysal, Olga
    Shafiq, M. Omair
    [J]. IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 1023 - 1030
  • [10] Performance Estimation of Machine Learning Algorithms in the Factor Analysis of COVID-19 Dataset
    Dubey, Ashutosh Kumar
    Narang, Sushil
    Kumar, Abhishek
    Sasubilli, Satya Murthy
    Garcia-Diaz, Vicente
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (02): : 1921 - 1936