Employing feature engineering strategies to improve the performance of machine learning algorithms on echocardiogram dataset

被引:0
|
作者
Huang, Huang-Nan [1 ]
Chen, Hong-Ming [1 ]
Lin, Wei-Wen [2 ,3 ,4 ,9 ]
Huang, Chau-Jian [5 ]
Chen, Yung-Cheng [6 ]
Wang, Yu-Huei [2 ]
Yang, Chao-Tung [6 ,7 ,8 ]
机构
[1] Tunghai Univ, Dept Appl Math, Taichung, Taiwan
[2] Taichung Vet Gen Hosp, Cardiovasc Ctr, Taichung, Taiwan
[3] Natl Chung Hsing Univ, Dept PostBaccalaureate Med, Taichung, Taiwan
[4] Tunghai Univ, Dept Life Sci, Taichung, Taiwan
[5] ShuZen Jr Coll Med & Management, Dept Informat Management, Kaohsiung, Taiwan
[6] Tunghai Univ, Dept Comp Sci, Taichung, Taiwan
[7] Tunghai Univ, Res Ctr Smart Sustainable Circular Econ, Taichung, Taiwan
[8] Tunghai Univ, Res Ctr Smart Sustainable Circular Econ, Dept Comp Sci, Taichung 407224, Taiwan
[9] Tunghai Univ, Natl Chung Hsing Univ, Taichung Vet Gen Hosp, Dept Life Sci,Cardiovasc Ctr,Taichung Dept Postbac, Taichung 407224, Taiwan
来源
DIGITAL HEALTH | 2023年 / 9卷
关键词
Precision medicine; feature selection; machine learning; data scrubbing; correlation matrix; ENSEMBLE; MODEL;
D O I
10.1177/20552076231207589
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
ObjectivesThis study mainly uses machine learning (ML) to make predictions by inputting features during training and inference. The method of feature selection is an important factor affecting the accuracy of ML models, and the process includes data extraction, which is the collection of all data required for ML. It also needs to import the concept of feature engineering, namely, this study needs to label the raw data of the cardiac ultrasound dataset with one or more meaningful and informative labels so that the ML model can learn from it and predict more accurate target values. Therefore, this study will enhance the strategies of feature selection methods from the raw dataset, as well as the issue of data scrubbing.MethodsIn this study, the ultrasound dataset was cleaned and critical features were selected through data standardization, normalization, and missing features imputation in the field of feature engineering. The aim of data scrubbing was to retain and select critical features of the echocardiogram dataset while making the prediction of the ML algorithm more accurate.ResultsThis paper mainly utilizes commonly used methods in feature engineering and finally selects four important feature values. With the ML algorithms available on the Azure platform, namely, Random Forest and CatBoost, a Voting Ensemble method is used as the training algorithm, and this study also uses visual tools to gain a clearer understanding of the raw data and to improve the accuracy of the predictive model.ConclusionThis paper emphasizes feature engineering, specifically on the cleaning and analysis of missing values in the raw dataset of echocardiography and the identification of important critical features in the raw dataset. The Azure platform is used to predict patients with a history of heart disease (individuals who have been under surveillance in the past three years and those who haven't). Through data scrubbing and preprocessing methods in feature engineering, the model can more accurately predict the future occurrence of heart disease in patients.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation
    Aamir, Muhammad
    Zaidi, Syed Mustafa Ali
    [J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2019, 18 (06) : 761 - 785
  • [22] DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation
    Muhammad Aamir
    Syed Mustafa Ali Zaidi
    [J]. International Journal of Information Security, 2019, 18 : 761 - 785
  • [23] Performance Assessment Using Supervised Machine Learning Algorithms of Opinion Mining on Social Media Dataset
    Susmitha, M.
    Pranitha, R. Laxmi
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND COMMUNICATION SYSTEMS, ICACECS 2021, 2022, : 419 - 427
  • [24] Classification of Medical Thermograms Belonging Neonates by Using Segmentation, Feature Engineering and Machine Learning Algorithms
    Ornek, Ahmet H.
    Ervural, Saim
    Ceylan, Murat
    Konak, Murat
    Soylu, Hanifi
    Savasci, Duygu
    [J]. TRAITEMENT DU SIGNAL, 2020, 37 (04) : 611 - 617
  • [25] Exploration of Feature Engineering Techniques and Unsupervised Machine Learning Clustering Algorithms for Geophysical Data on Levees
    Russo, Brittany M.
    Athanasopoulos-Zekkos, Adda
    [J]. GEO-CONGRESS 2024: GEOTECHNICAL DATA ANALYSIS AND COMPUTATION, 2024, 352 : 454 - 463
  • [26] A Comparative Analysis of Feature Selection Methods and Associated Machine Learning Algorithms on Wisconsin Breast Cancer Dataset (WBCD)
    Modi, Nileshkumar
    Ghanchi, Kaushar
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT, ICT4SD 2015, VOL 1, 2016, 408 : 215 - 224
  • [27] Machine Learning and Cognitive Algorithms for Engineering Applications
    Perlovsky, Leonid
    Kuvich, Gary
    [J]. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2013, 7 (04) : 64 - 82
  • [28] Mining the Global Terrorism Dataset using Machine Learning Algorithms
    Alsaedi, Alaa S.
    Almobarak, Arwa S.
    Alharbi, Saad T.
    [J]. 2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [29] Experimental Investigation of Three Machine Learning Algorithms for ITS Dataset
    Yearwood, J. L.
    Kang, B. H.
    Kelarev, A. V.
    [J]. FUTURE GENERATION INFORMATION TECHNOLOGY, PROCEEDINGS, 2009, 5899 : 308 - +
  • [30] Employing Feature Selection to Improve the Performance of Intrusion Detection Systems
    Avila, Ricardo
    Khoury, Raphael
    Pere, Christophe
    Khanmohammadi, Kobra
    [J]. FOUNDATIONS AND PRACTICE OF SECURITY, FPS 2021, 2022, 13291 : 93 - 112