The tau-path test for monotone association in an unspecified subpopulation. Application to chemogenomic data mining

被引:6
|
作者
Yu, Li [1 ]
Verducci, Joseph S. [1 ,2 ]
Blower, Paul E. [3 ,4 ]
机构
[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[2] Ohio State Univ, Math Biosci Inst, Columbus, OH 43210 USA
[3] Ohio State Univ, Program Pharmacogen, Dept Pharmacol, Columbus, OH 43210 USA
[4] Ohio State Univ, Ctr Comprehens Canc, Coll Med, Columbus, OH 43210 USA
关键词
Concordance matrix; Copula; Drug assay; Microarray; Mixture; Permutation; Quassinoids; NF-KAPPA-B; GENE-EXPRESSION; CANCER CELLS; TUMOR-CELLS; INHIBITORS; DIFFERENTIATION; APOPTOSIS; PROTEIN; LINES;
D O I
10.1016/j.stamet.2010.01.006
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In data mining and other settings, there is sometimes a need to identify relationships between variables when the relationship may hold only over a subset of the observations available. For example, expression of a particular gene may cause resistance to an anticancer drug, but only over certain types of cancer cell-lines. It may not be known in advance which types of cancer cell-lines (e.g., estrogen-regulated, newly differentiated, central nervous system) employ such a method of resistance. This situation differs from the usual setting in which partial correlations are estimated conditional on a known selection, such as the value of another variable. For any pair of variables of interest, the goal is to test if these are associated in some unspecified subpopulation that is represented by a subsample of the data we have available. Previous approaches rely heavily on bivariate normal assumptions, which are not easily adapted to non-linear association. We have tried several parametric and non-parametric approaches, and for both inferential and computational reasons have chosen to present a procedure based on a sequential development of Kendall's tau measure of monotone association. The sequence is achieved by reordering observations so that the sample tau coefficients {K-k} for the first k = 2, .... n of the n observations form a monotone decreasing path, ending at Kendall's tau coefficient T-n. Boundaries are constructed so that 95% of the paths remain within the boundaries under the null hypothesis of independence. A boundary crossing at any point k is evidence of a stronger than expected association amongst a subpopulation represented by the k observations involved. The method is used to screen for association between gene expression and compound activity amongst types of cancer cell-lines in the NCI-60 database. More generally, the method may be used to deconvolve a mixture of absolutely continuous bivariate distributions with the same margins but differing in strength of association. We prove that a particular method of reordering the observations is optimal against any other ordering for simultaneously identifying the highest tau association in subsets of size k (k = 2, ...., n). Furthermore, assuming a subpopulation of k, we present a way of quantifying how likely any observation is to be in that subpopulation. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:97 / 111
页数:15
相关论文
共 2 条
  • [1] The top-K tau-path screen for monotone association in subpopulations
    Sampath, Srinath
    Caloiaro, Adriano
    Johnson, Wayne
    Verducci, Joseph S.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2016, 8 (05): : 206 - 218
  • [2] Application of association rule mining on food safety test data
    Chen, Kai
    Tan, Hong
    Gao, Jie
    Wang, Daxia
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 4681 - 4684