Outliers detection in the statistical accuracy test of a pKa prediction

被引:0
|
作者
Milan Meloun
Sylva Bordovská
Karel Kupka
机构
[1] Pardubice University,Department of Analytical Chemistry, Faculty of Chemical Technology
[2] TriloByte Statistical Software,undefined
来源
关键词
p; prediction; Dissociation constants; Outliers; Residuals; Goodness-of-fit; Williams graph;
D O I
暂无
中图分类号
学科分类号
摘要
The regression diagnostics algorithm REGDIA in S-Plus is introduced to examine the accuracy of pKa predicted with four programs: PALLAS, MARVIN, PERRIN and SYBYL. On basis of a statistical analysis of residuals, outlier diagnostics are proposed. Residual analysis of the ADSTAT program is based on examining goodness-of-fit via graphical diagnostics of 15 exploratory data analysis plots, such as bar plots, box-and-whisker plots, dot plots, midsum plots, symmetry plots, kurtosis plots, differential quantile plots, quantile-box plots, frequency polygons, histograms, quantile plots, quantile-quantile plots, rankit plots, scatter plots, and autocorrelation plots. Outliers in pKa relate to molecules which are poorly characterized by the considered pKa program. Of the seven most efficient diagnostic plots (the Williams graph, Graph of predicted residuals, Pregibon graph, Gray L–R graph, Index graph of Atkinson measure, Index graph of diagonal elements of the hat matrix and Rankit Q–Q graph of jackknife residuals) the Williams graph was selected to give the most reliable detection of outliers. The six statistical characteristics, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${F_{\rm exp},R^{2},R_{\rm P}^{2},{\it MEP},{\it AIC}}$$\end{document}, and s in pKa units, successfully examine the specimen of 25 acids and bases of a Perrin’s data set classifying four pKa prediction algorithms. The highest values \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${F_{\rm exp},R^{2},R_{\rm P}^{2}}$$\end{document} and the lowest value of MEP and s and the most negative AIC have been found for PERRIN algorithm of pKa prediction so this algorithm achieves the best predictive power and the most accurate results. The proposed accuracy test of the REGDIA program can also be extended to test other predicted values, as log P, log D, aqueous solubility or some physicochemical properties.
引用
收藏
页码:891 / 909
页数:18
相关论文
共 50 条
  • [41] ON THE DETECTION OF MULTIVARIATE DATA OUTLIERS AND REGRESSION OUTLIERS
    LAZRAQ, A
    CLEROUX, R
    DATA ANALYSIS, LEARNING SYMBOLIC AND NUMERIC KNOWLEDGE, 1989, : 133 - 140
  • [42] Check your outliers! An introduction to identifying statistical outliers in R with easystats
    Theriault, Remi
    Ben-Shachar, Mattan S.
    Patil, Indrajeet
    Luedecke, Daniel
    Wiernik, Brenton M.
    Makowski, Dominique
    BEHAVIOR RESEARCH METHODS, 2024, 56 (04) : 4162 - 4172
  • [43] Statistical Study For The prediction of pKa Values of Substituted Benzaldoxime Based on Quantum Chemicals Methods
    Al-Hyali, Emad A. S.
    Al-Azzawi, Nezar A.
    Al-Abady, Faiz M. H.
    JOURNAL OF THE KOREAN CHEMICAL SOCIETY-DAEHAN HWAHAK HOE JEE, 2011, 55 (05): : 733 - 740
  • [44] A Statistical Model to Detect DRG Outliers
    Lin, Shuguang
    Rouse, Paul
    Wang, Ying-Ming
    Zhang, Fan
    IEEE ACCESS, 2022, 10 : 28717 - 28724
  • [45] Statistical mechanics of learning in the presence of outliers
    Dietrich, R
    Opper, M
    JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1998, 31 (46): : 9131 - 9147
  • [46] DATA TO PKA PREDICTION
    HANKONOVAK, K
    SZASZ, G
    JOZAN, M
    MAGYAR KEMIAI FOLYOIRAT, 1984, 90 (12): : 563 - 567
  • [47] Theoretical prediction of pKa
    Kasai, Yukako
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [48] STOCHASTIC CALCULATION OF CRITICAL Q-TEST VALUES FOR THE DETECTION OF OUTLIERS IN MEASUREMENTS
    EFSTATHIOU, CE
    JOURNAL OF CHEMICAL EDUCATION, 1992, 69 (09) : 733 - 736
  • [49] SKen: A Statistical Test for Removing Outliers in Optical Flow A 3D Reconstruction Case
    Macedo, Samuel
    Vasconcelos, Luis
    Cesar, Vincius
    Pessoa, Saulo
    Kelner, Judith
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS (VISAPP), VOL 1, 2014, : 202 - 209
  • [50] Handling Those Pesky Statistical Outliers
    Kovach, Christine R.
    Ke, Weiming
    RESEARCH IN GERONTOLOGICAL NURSING, 2016, 9 (05) : 206 - 207