Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features

被引:0
|
作者
Dongsheng Cao
Yizeng Liang
Qingsong Xu
Yifeng Yun
Hongdong Li
机构
[1] Central South University,Research Center of Modernization of Traditional Chinese Medicines
[2] Central South University,School of Mathematical Sciences and Computing Technology
关键词
QSAR/QSPR; Outlier detection; Variable selection; Monte Carlo; Statistical distribution;
D O I
暂无
中图分类号
学科分类号
摘要
Building a robust and reliable QSAR/QSPR model should greatly consider two aspects: selecting the optimal variable subset from a large pool of molecular descriptors and detecting outliers from a pool of samples. The two problems have the specific similarity and complementarity to some extent. Given a particular learning algorithm on a particular data set, one should consider how the interaction could happen between variable selection and outlier detection. In this paper, we describe a consistent methodology for simultaneously performing variable subset selection and outlier detection using the idea of statistical distribution which can be simulated by the establishment of many cross-predictive linear models. The approach exploits the fact that the distribution of linear model coefficients provides a mechanism for ranking and interpreting the effects of variable, while the distribution of prediction errors provides a mechanism for differentiating the outliers from normal samples. The use of statistic of these distributions, namely mean value and standard deviation, inherently provides a feasible way to effectively describe the information contained by the original samples. Several examples are used to demonstrate the prediction ability of our proposed approach through the comparison of different approaches as well as their combinations.
引用
收藏
页码:67 / 80
页数:13
相关论文
共 44 条
  • [1] Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features
    Cao, Dongsheng
    Liang, Yizeng
    Xu, Qingsong
    Yun, Yifeng
    Li, Hongdong
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2011, 25 (01) : 67 - 80
  • [2] Simultaneous outlier detection and variable selection for spatial Durbin model
    Cheng, Yi
    Song, Yunquan
    [J]. BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2023, 37 (03) : 596 - 618
  • [3] Simultaneous variable selection and outlier detection using a robust genetic algorithm
    Wiegand, Patrick
    Pell, Randy
    Comas, Enric
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2009, 98 (02) : 108 - 114
  • [4] Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling
    Cao, Dong-Sheng
    Deng, Zhen-Ke
    Zhu, Min-Feng
    Yao, Zhi-Jiang
    Dong, Jie
    Zhao, Rui-Gang
    [J]. JOURNAL OF CHEMOMETRICS, 2017, 31 (11)
  • [5] Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model
    Kim, Sung-Soo
    Park, Sung H.
    Krzanowski, W. J.
    [J]. JOURNAL OF APPLIED STATISTICS, 2008, 35 (03) : 283 - 291
  • [6] Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection
    Park, Jong Suk
    Park, Chun Gun
    Lee, Kyeong Eun
    [J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2019, 26 (02) : 149 - 161
  • [7] Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection
    Yang Peng
    Bin Luo
    Xiaoli Gao
    [J]. Sankhya B, 2022, 84 : 694 - 707
  • [8] Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection
    Peng, Yang
    Luo, Bin
    Gao, Xiaoli
    [J]. SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2022, 84 (02): : 694 - 707
  • [9] Toward an optimal procedure for variable selection and QSAR model building
    Yasri, A
    Hartsough, D
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (05): : 1218 - 1227
  • [10] Joint outlier detection and variable selection using discrete optimization
    Jammal, Mahdi
    Canu, Stephane
    Abdallah, Maher
    [J]. SORT-STATISTICS AND OPERATIONS RESEARCH TRANSACTIONS, 2021, 45 (01) : 47 - 66