chemmodlab: a cheminformatics modeling laboratoryR package for fitting and assessing machine learning models

被引:0
|
作者
Ash, Jeremy R. [1 ]
Hughes-Oliver, Jacqueline M. [2 ]
机构
[1] North Carolina State Univ, Bioinformat Res Ctr, Dept Stat, 335 Ricks Hall,Campus Box 7566, Raleigh, NC 27695 USA
[2] North Carolina State Univ, Dept Stat, 2311 Stinson Dr,Campus Box 8203, Raleigh, NC 27695 USA
来源
关键词
Machine learning; QSAR; R package; Initial enhancement; Enrichment factor; Accumulation curve; Hit enrichment curve; Repeated cross-validation; CROSS-VALIDATION; SELECTION BIAS; ERROR RATE; PREDICTION; PROPERTY;
D O I
10.1186/s13321-018-0309-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of these models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities, including a plotting function that constructs accumulation curves and a function that computes many performance measures. The most novel feature of chemmodlab is the ease with which statistically significant performance differences for many machine learning models is presented by means of the multiple comparisons similarity plot. Differences are assessed using repeated k-fold cross validation, where blocking increases precision and multiplicity adjustments are applied. chemmodlab is freely available on CRAN at https://cran.r-project.org/web/packages/chemmodlab/index.html.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] ChemML: A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data
    Haghighatlari, Mojtaba
    Vishwakarma, Gaurav
    Altarawy, Doaa
    Subramanian, Ramachandran
    Kota, Bhargava U.
    Sonpal, Aditya
    Setlur, Srirangaraj
    Hachmann, Johannes
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2020, 10 (04)
  • [42] Modelly: An open source all in one python']python package for developing machine learning models
    Sarkar, Tushar
    Shah, Disha
    SOFTWARE IMPACTS, 2022, 14
  • [43] Assessing the Impact of Temporal Data Aggregation on the Reliability of Predictive Machine Learning Models
    Barhrhouj, Ayah
    Ananou, Bouchra
    Ouladsine, Mustapha
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 481 - 492
  • [44] Assessing the Feasibility of Estimating Axon Diameter using Diffusion Models and Machine Learning
    Fick, Rutger H. J.
    Sepasian, Neda
    Pizzolato, Marco
    Ianus, Andrada
    Deriche, Rachid
    2017 IEEE 14TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2017), 2017, : 766 - 769
  • [45] Vocal markers of schizophrenia: assessing the generalizability of machine learning models and their clinical applicability
    Parola, A.
    Rybner, A.
    Jessen, E. T.
    Mortensen, M. Damsgaard
    Larsen, S. Nyhus
    Simonsen, A.
    Zhou, Y.
    Koelkebeck, K.
    Bliksted, V.
    Fusaroli, R.
    EUROPEAN PSYCHIATRY, 2023, 66 : S186 - S186
  • [46] Discussion on "Assessing Predictability of Environmental Time Series With Statistical and Machine Learning Models"
    Maranzano, Paolo
    Parker, Paul A.
    ENVIRONMETRICS, 2025, 36 (02)
  • [47] Assessing the Interpretability of Machine Learning Models in Early Detection of Alzheimer's Disease
    Haddada, Karim
    Ibn Khedher, Mohamed
    Jemai, Olfa
    Khedher, Sarra Iben
    El-Yaeoubi, Mounim A.
    2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,
  • [48] Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
    Kawaguchi, Risa K.
    Takahashi, Masamichi
    Miyake, Mototaka
    Kinoshita, Manabu
    Takahashi, Satoshi
    Ichimura, Koichi
    Hamamoto, Ryuji
    Narita, Yoshitaka
    Sese, Jun
    CANCERS, 2021, 13 (14)
  • [49] Assessing sediment organic pollution via machine learning models and resource performance
    Huang, Na
    Gao, Kai
    Yang, Weiming
    Pang, Han
    Yang, Gang
    Wu, Jun
    Zhang, Shirong
    Chen, Chao
    Long, Lulu
    BIORESOURCE TECHNOLOGY, 2022, 361
  • [50] Assessing regional competitiveness in Peru: An approach using nonlinear machine learning models
    Garcia-Lopez, Yvan J.
    Castro, Luis A. del Carpio
    PLOS ONE, 2025, 20 (02):