Outliers detection in the statistical accuracy test of a pKa prediction

被引:0
|
作者
Milan Meloun
Sylva Bordovská
Karel Kupka
机构
[1] Pardubice University,Department of Analytical Chemistry, Faculty of Chemical Technology
[2] TriloByte Statistical Software,undefined
来源
关键词
p; prediction; Dissociation constants; Outliers; Residuals; Goodness-of-fit; Williams graph;
D O I
暂无
中图分类号
学科分类号
摘要
The regression diagnostics algorithm REGDIA in S-Plus is introduced to examine the accuracy of pKa predicted with four programs: PALLAS, MARVIN, PERRIN and SYBYL. On basis of a statistical analysis of residuals, outlier diagnostics are proposed. Residual analysis of the ADSTAT program is based on examining goodness-of-fit via graphical diagnostics of 15 exploratory data analysis plots, such as bar plots, box-and-whisker plots, dot plots, midsum plots, symmetry plots, kurtosis plots, differential quantile plots, quantile-box plots, frequency polygons, histograms, quantile plots, quantile-quantile plots, rankit plots, scatter plots, and autocorrelation plots. Outliers in pKa relate to molecules which are poorly characterized by the considered pKa program. Of the seven most efficient diagnostic plots (the Williams graph, Graph of predicted residuals, Pregibon graph, Gray L–R graph, Index graph of Atkinson measure, Index graph of diagonal elements of the hat matrix and Rankit Q–Q graph of jackknife residuals) the Williams graph was selected to give the most reliable detection of outliers. The six statistical characteristics, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${F_{\rm exp},R^{2},R_{\rm P}^{2},{\it MEP},{\it AIC}}$$\end{document}, and s in pKa units, successfully examine the specimen of 25 acids and bases of a Perrin’s data set classifying four pKa prediction algorithms. The highest values \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${F_{\rm exp},R^{2},R_{\rm P}^{2}}$$\end{document} and the lowest value of MEP and s and the most negative AIC have been found for PERRIN algorithm of pKa prediction so this algorithm achieves the best predictive power and the most accurate results. The proposed accuracy test of the REGDIA program can also be extended to test other predicted values, as log P, log D, aqueous solubility or some physicochemical properties.
引用
收藏
页码:891 / 909
页数:18
相关论文
共 50 条
  • [21] Changing test and data modeling requirements for screening latent defects as statistical outliers
    Turakhia, RP
    Daasch, WR
    Lurkins, J
    Benware, B
    IEEE DESIGN & TEST OF COMPUTERS, 2006, 23 (02): : 100 - 109
  • [22] ROHLFS GENERALIZATION OF GAP TEST FOR DETECTION OF MULTIVARIATE OUTLIERS - REPLY
    ROHLF, FJ
    BIOMETRICS, 1977, 33 (04) : 763 - 764
  • [23] Unsupervised online detection and prediction of outliers in streams of sensor data
    Reunanen, Niko
    Raty, Tomi
    Jokinen, Juho J.
    Hoyt, Tyler
    Culler, David
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2020, 9 (03) : 285 - 314
  • [24] User Activities Outliers Detection; Integration of Statistical and Computational Intelligence Techniques
    Mahmoud, Sawsan
    Lotfi, Ahmad
    Langensiepen, Caroline
    COMPUTATIONAL INTELLIGENCE, 2016, 32 (01) : 49 - 71
  • [25] Measuring diagnostic accuracy of statistical prediction rules
    Hand, DJ
    STATISTICA NEERLANDICA, 2001, 55 (01) : 3 - 16
  • [26] CLOUD: a non-parametric detection test for microbiome outliers
    Montassier, Emmanuel
    Al-Ghalith, Gabriel A.
    Hillmann, Benjamin
    Viskocil, Kimberly
    Kabage, Amanda J.
    McKinlay, Christopher E.
    Sadowsky, Michael J.
    Khoruts, Alexander
    Knights, Dan
    MICROBIOME, 2018, 6
  • [27] Unsupervised online detection and prediction of outliers in streams of sensor data
    Niko Reunanen
    Tomi Räty
    Juho J. Jokinen
    Tyler Hoyt
    David Culler
    International Journal of Data Science and Analytics, 2020, 9 : 285 - 314
  • [28] Bayesian Network-Based Detection And Prediction of Outliers in Subspace
    Zhou, Lihua
    Liu, Weiyi
    Chen, Hongmei
    Wang, Lizhen
    Chen, Jilong
    Yang, Xiaodong
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 2479 - 2485
  • [29] CLOUD: a non-parametric detection test for microbiome outliers
    Emmanuel Montassier
    Gabriel A. Al-Ghalith
    Benjamin Hillmann
    Kimberly Viskocil
    Amanda J. Kabage
    Christopher E. McKinlay
    Michael J. Sadowsky
    Alexander Khoruts
    Dan Knights
    Microbiome, 6
  • [30] Detection of outliers
    Hadi, Ali S.
    Imon, A. H. M. Rahmatullah
    Werner, Mark
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2009, 1 (01): : 57 - 70