Research on the Construction and Realization of Data Pipeline in Machine Learning Regression Prediction

被引:2
|
作者
Zhang, Hua [1 ,2 ]
Zheng, Guoxun [1 ,2 ]
Xu, Jun [2 ]
Yao, Xuekun [1 ,2 ]
机构
[1] Changchun Inst Technol, Sch Comp Technol & Engn, Changchun 130012, Jilin, Peoples R China
[2] Changchun Inst Technol, Jilin Prov Key Lab Changbai Hist Culture & VR Reco, Changchun 130012, Jilin, Peoples R China
关键词
D O I
10.1155/2022/7924335
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The data set used by machine learning usually contains missing value and text type data, and sometimes, it is necessary to combine the attributes in the data set. The data set must be cleaned and converted before the machine learning model can be generated. This is frequently a chain of events. The entire processing procedure will be time-consuming and inconvenient. This article examines the data pipeline and recommends that it be used to process all data. We carry out automation and use k-fold cross-validation to evaluate the performance of the model. Experiments demonstrate that it can lower the regression prediction model's root mean square error and enhance prediction accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Research on the Construction and Realization of Data Pipeline in Machine Learning Regression Prediction
    Zhang, Hua
    Zheng, Guoxun
    Xu, Jun
    Yao, Xuekun
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [2] An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
    Kokkotis, Christos
    Giarmatzis, Georgios
    Giannakou, Erasmia
    Moustakidis, Serafeim
    Tsatalas, Themistoklis
    Tsiptsios, Dimitrios
    Vadikolias, Konstantinos
    Aggelousis, Nikolaos
    [J]. DIAGNOSTICS, 2022, 12 (10)
  • [3] Data processing pipeline for cardiogenic shock prediction using machine learning
    Jajcay, Nikola
    Bezak, Branislav
    Segev, Amitai
    Matetzky, Shlomi
    Jankova, Jana
    Spartalis, Michael
    El Tahlawi, Mohammad
    Guerra, Federico
    Friebel, Julian
    Thevathasan, Tharusan
    Berta, Imrich
    Poelzl, Leo
    Naegele, Felix
    Pogran, Edita
    Cader, F. Aaysha
    Jarakovic, Milana
    Gollmann-Tepekoeylue, Can
    Kollarova, Marta
    Petrikova, Katarina
    Tica, Otilia
    Krychtiuk, Konstantin A.
    Tavazzi, Guido
    Skurk, Carsten
    Huber, Kurt
    Boehm, Allan
    [J]. FRONTIERS IN CARDIOVASCULAR MEDICINE, 2023, 10
  • [4] A machine learning pipeline for quantitative phenotype prediction from genotype data
    Giorgio Guzzetta
    Giuseppe Jurman
    Cesare Furlanello
    [J]. BMC Bioinformatics, 11
  • [5] A machine learning pipeline for quantitative phenotype prediction from genotype data
    Guzzetta, Giorgio
    Jurman, Giuseppe
    Furlanello, Cesare
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] Optimal Data Construction in Supervised Machine Learning for Financial Prediction
    Kim, Hongjoong
    Moon, Kyoung-Sook
    [J]. ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2024, 58 (01): : 5 - 20
  • [7] Research on the Cost Prediction Model of Construction Projects Based on the Support Vector Regression Machine
    Kong, Xiangpeng
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 284 - 284
  • [8] Impact of Machine Learning Pipeline Choices in Autism Prediction From Functional Connectivity Data
    Grana, Manuel
    Silva, Moises
    [J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2021, 31 (04)
  • [9] Collision Risk Prediction for Vehicles with Sensor Data Fusion through a Machine Learning Pipeline
    Jiang, Yongpeng
    Hu, Jianming
    Liu, Hantao
    [J]. INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2022: APPLICATION OF EMERGING TECHNOLOGIES, 2022, : 246 - 260
  • [10] Interpretable machine learning models for failure cause prediction in imbalanced oil pipeline data
    Awuku, Bright
    Huang, Ying
    Yodo, Nita
    Asa, Eric
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (07)