Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

被引:1
|
作者
Morita, Katsuhisa [1 ]
Mizuno, Tadahaya [1 ]
Kusuhara, Hiroyuki [1 ]
机构
[1] Univ Tokyo, Grad Sch Pharmaceut Sci, Bunkyo ku, Tokyo 1130033, Japan
关键词
INDUCED LIVER-INJURY; IN-VITRO; DESCRIPTORS; INFORMATION;
D O I
10.1021/acs.jcim.2c00765
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Adverse events are a serious issue in drug develop-ment, and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach does not strictly match the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not clear due to the lack of comparable studies. To understand the differences, we compared the model performance between the time and random splits using nine types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real -world prediction of adverse events. We provide the analysis code and datasets used in the present study at https://github.com/ mizuno-group/AE_prediction.
引用
收藏
页码:3982 / 3992
页数:11
相关论文
共 50 条
  • [41] A treatment prediction strategy for overactive bladder using a machine learning algorithm that utilized data from the FAITH study
    Hadi, Farid
    Sumarsono, Budiwan
    Lee, Kyu-Sung
    Oh, Seung-June
    Cho, Sung Tae
    Hsu, Yu-Chao
    Rasner, Paul
    Jenkins, Cerys
    Fisher, Harry
    [J]. NEUROUROLOGY AND URODYNAMICS, 2023, 42 (06) : 1227 - 1237
  • [42] A work load prediction strategy for power optimization on cloud based data centre using deep machine learning
    P. S. Latha Kalyampudi
    P. Venkata Krishna
    Sathish Kuppani
    V. Saritha
    [J]. Evolutionary Intelligence, 2021, 14 : 519 - 527
  • [43] Prediction of an educational institute learning environment using machine learning and data mining
    Shoaib, Muhammad
    Sayed, Nasir
    Amara, Nedra
    Latif, Abdul
    Azam, Sikandar
    Muhammad, Sajjad
    [J]. EDUCATION AND INFORMATION TECHNOLOGIES, 2022, 27 (07) : 9099 - 9123
  • [44] Prediction of an educational institute learning environment using machine learning and data mining
    Muhammad Shoaib
    Nasir Sayed
    Nedra Amara
    Abdul Latif
    Sikandar Azam
    Sajjad Muhammad
    [J]. Education and Information Technologies, 2022, 27 : 9099 - 9123
  • [45] P44 PREDICTION OF ADVERSE DRUG REACTIONS OF BIASED DATA USING BOOTSTRAP AGGREGATING AND MACHINE LEARNING TECHNIQUES
    Ghosh, Dipayan
    Koneti, Geervani
    Ramamurthi, Narayanan
    [J]. DRUG METABOLISM AND PHARMACOKINETICS, 2019, 34 (01) : S33 - S34
  • [46] Ultrasonic prediction of crack density using machine learning: A numerical investigation
    Sadegh Karimpouli
    Pejman Tahmasebi
    Erik HSaenger
    [J]. Geoscience Frontiers, 2022, (01) - 132
  • [47] Ultrasonic prediction of crack density using machine learning: A numerical investigation
    Karimpouli, Sadegh
    Tahmasebi, Pejman
    Saenger, Erik H.
    [J]. GEOSCIENCE FRONTIERS, 2022, 13 (01)
  • [48] Ultrasonic prediction of crack density using machine learning: A numerical investigation
    Sadegh Karimpouli
    Pejman Tahmasebi
    Erik H.Saenger
    [J]. Geoscience Frontiers, 2022, 13 (01) : 120 - 132
  • [49] Event entry time prediction in financial business processes using machine learning: A use case from loan applications
    Frey, Michael
    Emrich, Andreas
    Fettke, Peter
    Loos, Peter
    [J]. PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 1386 - 1394
  • [50] A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data
    Smith, Hayley
    Sweeting, Michael
    Morris, Tim
    Crowther, Michael
    [J]. DIAGNOSTIC AND PROGNOSTIC RESEARCH, 2022, 6 (01)