The top-K tau-path screen for monotone association in subpopulations

被引:2
|
作者
Sampath, Srinath [1 ]
Caloiaro, Adriano [2 ]
Johnson, Wayne [3 ]
Verducci, Joseph S. [4 ]
机构
[1] Hamilton Capital Management, Columbus, OH 43220 USA
[2] Greenhouse Software Inc, New York, NY USA
[3] Myatt & Johnson Inc, Miami Beach, FL USA
[4] Ohio State Univ, Columbus, OH 43210 USA
关键词
algorithmic complexity; big data; mixtures of copulas; nonparametric correlation; ranking models; unsupervised classification;
D O I
10.1002/wics.1382
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A pair of variables that tend to rise and fall either together or in opposition are said to be monotonically associated. For certain phenomena, this tendency is causally restricted to a subpopulation, as, e.g., the severity of an allergic reaction trending with the concentration of an air pollutant. Previously, Yu et al. (Stat Methodol 2011, 8:97-111) devised a method of rearranging observations to test paired data to see if such an association might be present in a subpopulation. However, the computational intensity of the method limited its application to relatively small samples of data, and the test itself only judges if association is present in some subpopulation; it does not clearly identify the subsample that came from this subpopulation, especially when the whole sample tests positive. The present study adds a 'top-K' feature (Sampath S, Verducci JS. Stat Anal Data Min 2013, 6:458-471) based on a multistage ranking model, that identifies a concise subsample that is likely to contain a high proportion of observations from the subpopulation in which the association is supported. Computational improvements incorporated into this top-K tau-path algorithm now allow the method to be extended to thousands of pairs of variables measured on sample sizes in the thousands. A description of the new algorithm along with measures of computational complexity and practical efficiency help to gauge its potential use in different settings. Simulation studies catalog its accuracy in various settings, and an example from finance illustrates its step-by-step use. (C) 2016 Wiley Periodicals, Inc.
引用
收藏
页码:206 / 218
页数:13
相关论文
共 38 条
  • [21] PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
    Sunt, Yizhou
    Hant, Jiawei
    Yant, Xifeng
    Yu, Philip S.
    Wuo, Tianyi
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (11): : 992 - 1003
  • [22] Fast Top-K association rule mining using rule generation property pruning
    Xiangyu Liu
    Xinzheng Niu
    Philippe Fournier-Viger
    Applied Intelligence, 2021, 51 : 2077 - 2093
  • [23] TKAR: Efficient Mining of Top-k Association Rules on Real-Life Datasets
    Gireesha, O.
    Obulesu, O.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, (FICTA 2016), VOL 2, 2017, 516 : 45 - 54
  • [24] A Quick Method for Querying Top-k Rules from Class Association Rule Set
    Nguyen, Loan T. T.
    Ngoc-Thanh Nguyen
    Trawinski, Bogdan
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2016, 22 (06) : 822 - 835
  • [25] Fast Top-K association rule mining using rule generation property pruning
    Liu, Xiangyu
    Niu, Xinzheng
    Fournier-Viger, Philippe
    APPLIED INTELLIGENCE, 2021, 51 (04) : 2077 - 2093
  • [26] Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling
    Akiba, Takuya
    Hayashi, Takanori
    Nori, Nozomi
    Iwata, Yoichi
    Yoshida, Yuichi
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2 - 8
  • [27] A Top-k Analysis Using Multi-level Association Rule Mining for Autism Treatments
    Engle, Kelley M.
    Rada, Roy
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT 4, 2011, 6768 : 328 - 334
  • [28] Research efficient energy-saving mining algorithm top-K strong association pair
    Zhang, Lingxiao
    Ping, Liu
    Yang, Xinfeng
    Energy Education Science and Technology Part A: Energy Science and Research, 2013, 31 (01): : 545 - 548
  • [29] The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique
    Heo, Jun-Seok
    Whang, Kyu-Young
    Kim, Min-Soo
    Kim, Yi-Reun
    Song, Il-Yeol
    INFORMATION SCIENCES, 2009, 179 (19) : 3286 - 3308
  • [30] DSM-TKP: Mining Top-K Path traversal patterns over Web click-streams
    Li, HF
    Lee, SY
    Shan, MK
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 326 - 329