Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

被引:1
|
作者
Morita, Katsuhisa [1 ]
Mizuno, Tadahaya [1 ]
Kusuhara, Hiroyuki [1 ]
机构
[1] Univ Tokyo, Grad Sch Pharmaceut Sci, Bunkyo ku, Tokyo 1130033, Japan
关键词
INDUCED LIVER-INJURY; IN-VITRO; DESCRIPTORS; INFORMATION;
D O I
10.1021/acs.jcim.2c00765
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Adverse events are a serious issue in drug develop-ment, and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach does not strictly match the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not clear due to the lack of comparable studies. To understand the differences, we compared the model performance between the time and random splits using nine types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real -world prediction of adverse events. We provide the analysis code and datasets used in the present study at https://github.com/ mizuno-group/AE_prediction.
引用
收藏
页码:3982 / 3992
页数:11
相关论文
共 50 条
  • [1] A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data
    Wolfson, Julian
    Bandyopadhyay, Sunayan
    Elidrisi, Mohamed
    Vazquez-Benitez, Gabriela
    Vock, David M.
    Musgrove, Donald
    Adomavicius, Gediminas
    Johnson, Paul E.
    O'Connor, Patrick J.
    [J]. STATISTICS IN MEDICINE, 2015, 34 (21) : 2941 - 2957
  • [2] An Investigation of Interpretable Deep Learning for Adverse Drug Event Prediction
    Rebane, Jonathan
    Karlsson, Isak
    Papapetrou, Panagiotis
    [J]. 2019 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2019, : 337 - 342
  • [3] Investigation of the Effect of "Fog of War" in the Prediction of StarCraft Strategy Using Machine Learning
    Cho, Hochul
    Park, Hyunsoo
    Kim, Chang-Yeun
    Kim, Kyung-Joong
    [J]. COMPUTERS IN ENTERTAINMENT, 2016, 14 (01):
  • [4] Investigation of Machine Learning Techniques for Disruption Prediction Using JET Data
    Croonen, Joost
    Amaya, Jorge
    Lapenta, Giovanni
    [J]. PLASMA, 2023, 6 (01) : 89 - 102
  • [5] Machine-learning-based adverse drug event prediction from observational health data: A review
    Denck, Jonas
    Ozkirimli, Elif
    Wang, Ken
    [J]. DRUG DISCOVERY TODAY, 2023, 28 (09)
  • [6] Time Series Data Prediction using IoT and Machine Learning Technique
    Kumar, Raghavendra
    Kumar, Pardeep
    Kumar, Yugal
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 373 - 381
  • [7] A Novel Approach to Standardizing Data & Detecting Duplicates Across Adverse Event Data Sources Using Machine Learning
    Desai, S.
    Chan, K.
    Bannout, K.
    Mingle, E.
    Freeman, J.
    Parikh, U.
    Becker, N.
    [J]. DRUG SAFETY, 2018, 41 (11) : 1246 - 1247
  • [8] Prediction of Bitcoin Prices with Machine Learning Methods using Time Series Data
    Karasu, Seckin
    Altan, Aytac
    Sarac, Zehra
    Hacioglu, Rifat
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [9] Microgrid Data Prediction Using Machine Learning
    Lautert, Renata Rodrigues
    Cambambi, Claudio Adriano C.
    Rangel, Camilo Alberto S.
    Canha, Luciane Neves
    de Freitas, Adriano Gomes
    Brignol, Wagner da Silva
    [J]. 2023 15TH SEMINAR ON POWER ELECTRONICS AND CONTROL, SEPOC, 2023,
  • [10] Machine learning for adverse event prediction in outpatient parenteral antimicrobial therapy: a scoping review
    Challener, Douglas W.
    Fida, Madiha
    Martin, Peter
    Rivera, Christina G.
    Virk, Abinash
    Walker, Lorne W.
    [J]. JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY, 2024,