The top-K tau-path screen for monotone association in subpopulations

被引:2
|
作者
Sampath, Srinath [1 ]
Caloiaro, Adriano [2 ]
Johnson, Wayne [3 ]
Verducci, Joseph S. [4 ]
机构
[1] Hamilton Capital Management, Columbus, OH 43220 USA
[2] Greenhouse Software Inc, New York, NY USA
[3] Myatt & Johnson Inc, Miami Beach, FL USA
[4] Ohio State Univ, Columbus, OH 43210 USA
关键词
algorithmic complexity; big data; mixtures of copulas; nonparametric correlation; ranking models; unsupervised classification;
D O I
10.1002/wics.1382
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A pair of variables that tend to rise and fall either together or in opposition are said to be monotonically associated. For certain phenomena, this tendency is causally restricted to a subpopulation, as, e.g., the severity of an allergic reaction trending with the concentration of an air pollutant. Previously, Yu et al. (Stat Methodol 2011, 8:97-111) devised a method of rearranging observations to test paired data to see if such an association might be present in a subpopulation. However, the computational intensity of the method limited its application to relatively small samples of data, and the test itself only judges if association is present in some subpopulation; it does not clearly identify the subsample that came from this subpopulation, especially when the whole sample tests positive. The present study adds a 'top-K' feature (Sampath S, Verducci JS. Stat Anal Data Min 2013, 6:458-471) based on a multistage ranking model, that identifies a concise subsample that is likely to contain a high proportion of observations from the subpopulation in which the association is supported. Computational improvements incorporated into this top-K tau-path algorithm now allow the method to be extended to thousands of pairs of variables measured on sample sizes in the thousands. A description of the new algorithm along with measures of computational complexity and practical efficiency help to gauge its potential use in different settings. Simulation studies catalog its accuracy in various settings, and an example from finance illustrates its step-by-step use. (C) 2016 Wiley Periodicals, Inc.
引用
收藏
页码:206 / 218
页数:13
相关论文
共 38 条
  • [1] The tau-path test for monotone association in an unspecified subpopulation. Application to chemogenomic data mining
    Yu, Li
    Verducci, Joseph S.
    Blower, Paul E.
    STATISTICAL METHODOLOGY, 2011, 8 (01) : 97 - 111
  • [2] Top-k critical Vertices Query on Shortest Path
    Ma, Jing
    Yao, Bin
    Gao, Xiaofeng
    Shen, Yanyan
    Guo, Minyi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1999 - 2012
  • [3] Finding Top-k Approximate Answers to Path Queries
    Hurtado, Carlos A.
    Poulovassilis, Alexandra
    Wood, Peter T.
    FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 465 - 476
  • [4] Holistic Top-k Simple Shortest Path Join in Graphs
    Gao, Jun
    Yu, Jeffrey Xu
    Qiu, Huida
    Jiang, Xiao
    Wang, Tengjiao
    Yang, Dongqing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (04) : 665 - 677
  • [5] Top-k shortest-path query on RDF graphs
    Zhang, Deng-Yi
    Wu, Wen-Li
    Ouyang, Chu-Fei
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2015, 43 (08): : 1531 - 1537
  • [6] An Improved Algorithm for Mining Top-k Association Rules
    Nguyen, Linh T. T.
    Nguyen, Loan T. T.
    Bay Vo
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, ICCSAMA 2017, 2018, 629 : 117 - 128
  • [7] Mining top-k granular association rules for recommendation
    Min, Fan
    Zhu, William
    PROCEEDINGS OF THE 2013 JOINT IFSA WORLD CONGRESS AND NAFIPS ANNUAL MEETING (IFSA/NAFIPS), 2013, : 1372 - 1376
  • [8] On efficient top-k transaction path query processing in blockchain database
    Hao, Kun
    Xin, Junchang
    Wang, Zhiqiong
    Yao, Zhongming
    Wang, Guoren
    DATA & KNOWLEDGE ENGINEERING, 2022, 141
  • [9] ETARM: an efficient top-k association rule mining algorithm
    Nguyen, Linh T. T.
    Bay Vo
    Nguyen, Loan T. T.
    Fournier-Viger, Philippe
    Selamat, Ali
    APPLIED INTELLIGENCE, 2018, 48 (05) : 1148 - 1160
  • [10] ETARM: an efficient top-k association rule mining algorithm
    Linh T. T. Nguyen
    Bay Vo
    Loan T. T. Nguyen
    Philippe Fournier-Viger
    Ali Selamat
    Applied Intelligence, 2018, 48 : 1148 - 1160