Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics

被引:24
|
作者
Dryden, Ian L.
Hirst, Jonathan D.
Melville, James L.
机构
[1] Univ Nottingham, Sch Mat Sci, Nottingham NG7 2RD, England
[2] Univ Nottingham, Sch Chem, Nottingham NG7 2RD, England
基金
英国工程与自然科学研究理事会;
关键词
alignment; Bayesian; bioinformatics; chemoinformatics; Markov chain Monte Carlo; mixture model; procrustes; Riemannian metric; rigid body transformations; shape; size arid shape; steroids;
D O I
10.1111/j.1541-0420.2006.00622.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider Bayesian methodology for comparing two or more unlabeled point sets. Application of the technique to a set of steroid molecules illustrates its potential utility involving the comparison of molecules in chemoinformatics and bioinformatics. We initially match a pair of molecules, where one molecule is regarded as random and the other fixed. A type of mixture model is proposed for the point set coordinates, and the parameters of the distribution are a labeling matrix (indicating which pairs of points match) and a concentration parameter. Art important property of the likelihood is that it, is invariant under rotations and translations of tire data. Bayesian inference for tire parameters is carried out using Markov chain Monte Carlo simulation, and it is demonstrated that the procedure works well on the steroid data. The posterior distribution is difficult to simulate from, due to multiple local modes, and we also use additional data (partial charges on atoms) to help with this task. An approximation is considered for speeding up the simulation algorithm, and the approximating fast algorithm leads to essentially identical inference to that trader the exact method for our data. Extensions to multiple molecule alignment are also introduced, and an algorithm is described which also works well on the steroid data set. After all the steroid molecules have been matched, exploratory data analysis is carried out to examine,which molecules are similar. Also, further Bayesian inference for the multiple alignment problem is considered.
引用
收藏
页码:237 / 251
页数:15
相关论文
共 50 条
  • [31] Statistical Analysis of Maximally Similar Sets in Ecological Research
    Roberts, David W.
    MATHEMATICS, 2018, 6 (12):
  • [32] Self-Contained Statistical Analysis of Gene Sets
    Torres, David J.
    Cannon, Judy L.
    Ricoy, Ulises M.
    Johnson, Christopher
    PLOS ONE, 2016, 11 (10):
  • [33] STATISTICAL-ANALYSIS OF HISTORICAL CLIMATE DATA SETS
    MOBLEY, CD
    PREISENDORFER, RW
    JOURNAL OF CLIMATE AND APPLIED METEOROLOGY, 1985, 24 (06): : 555 - 567
  • [34] THE USE OF STATISTICAL POINT PROCESSES IN GEOINFORMATION ANALYSIS
    Stein, Alfred
    Tolpekin, Valentyn
    Spatenkova, Olga
    JOINT INTERNATIONAL CONFERENCE ON THEORY, DATA HANDLING AND MODELLING IN GEOSPATIAL INFORMATION SCIENCE, 2010, 38 : 109 - 113
  • [35] POINT SETS SIMPLIFICATION USING LOCAL SURFACE ANALYSIS
    Guo Xianglin
    Pang Mingyong
    PROCEEDINGS OF 2009 2ND IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2009, : 575 - 579
  • [36] Ray-Shooting Depth: Computing Statistical Data Depth of Point Sets in the Plane
    Mustafa, Nabil H.
    Ray, Saurabh
    Shabbir, Mudassir
    ALGORITHMS - ESA 2011, 2011, 6942 : 506 - 517
  • [37] An Analysis of Mixed Integer Linear Sets Based on Lattice Point Free Convex Sets
    Andersen, Kent
    Louveaux, Quentin
    Weismantel, Robert
    MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (01) : 233 - 256
  • [38] p-SAGE: Parametric Statistical Analysis of Gene Sets
    Hwang Bo
    Li Wen-Ting
    Li Wen
    Xia Xue-Feng
    Sun Zhi-Rong
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2009, 36 (11) : 1415 - 1422
  • [39] A STATISTICAL-ANALYSIS OF PARTICULATE DATA SETS IN BRISBANE, AUSTRALIA
    SIMPSON, RW
    ATMOSPHERIC ENVIRONMENT PART B-URBAN ATMOSPHERE, 1992, 26 (01): : 99 - 105
  • [40] Comparing Visual and Statistical Analysis of Multiple Baseline Design Graphs
    Wolfe, Katie
    Dickenson, Tammiee S.
    Miller, Bridget
    McGrath, Kathleen V.
    BEHAVIOR MODIFICATION, 2019, 43 (03) : 361 - 388