A multiple testing protocol for exploratory data analysis and the local misclassification rate

被引:1
|
作者
Watts, David D. [1 ]
Habiger, Joshua D. [1 ]
机构
[1] Oklahoma State Univ, Dept Stat, Stillwater, OK 74078 USA
关键词
Classification; False discovery rate; Local false discovery rate; Local misclassification rate; Statistical significance; FALSE DISCOVERY RATE; P-VALUES; HYPOTHESIS;
D O I
10.1080/03610926.2017.1361982
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A false discovery rate (FDR) procedure is often employed in exploratory data analysis to determine which among thousands or millions of attributes are worthy of follow-up analysis. However, these methods tend to discover the most statistically significant attributes, which need not be the most worthy of further exploration. This article provides a new FDR-controlling method that allows for the nature of the exploratory analysis to be considered when determining which attributes are discovered. To illustrate, a study in which the objective is to classify discoveries into one of several clusters is considered, and a new FDR method that minimizes the misclassification rate is developed. It is shown analytically and with simulation that the proposed method performs better than competing methods.
引用
下载
收藏
页码:3588 / 3604
页数:17
相关论文
共 50 条
  • [31] Mining Multiple Data Sources: Local Pattern Analysis
    Shichao Zhang
    Mohammed J. Zaki
    Data Mining and Knowledge Discovery, 2006, 12 : 121 - 125
  • [32] Mining multiple data sources: Local pattern analysis
    Zhang, SC
    Zaki, MJ
    DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (2-3) : 121 - 125
  • [33] A MULTIPLE MUSCLE STRENGTH TESTING PROTOCOL
    PATTERSON, RP
    BAXTER, T
    ARCHIVES OF PHYSICAL MEDICINE AND REHABILITATION, 1988, 69 (05): : 366 - 368
  • [34] Exploratory testing of a local water sustainability governance model
    Vazquez, Francisco Sandoval
    Aguayo, Jose Bustos
    Lirios, Cruz Garcia
    REVISTA GESTION DE LAS PERSONAS Y TECNOLOGIA, 2018, 11 (31): : 72 - 87
  • [35] Unveiling the Burden of Miscoding and Misclassification in Stroke Mortality: Analysis of Multiple Cause-of-Death Data in Mexico
    Cahuana-Hurtado, Lucero
    Gomez-Dantes, Hector
    de la Cruz-gongora, Vanessa
    Chiquete, Erwin
    Cantu-Brito, Carlos
    NEUROEPIDEMIOLOGY, 2023, 57 (05) : 284 - 292
  • [36] MULTIPLE TESTING OF LOCAL MAXIMA FOR DETECTION OF PEAKS IN CHIP-SEQ DATA
    Schwartzman, Armin
    Jaffe, Andrew
    Gavrilov, Yulia
    Meyer, Clifford A.
    ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 471 - 494
  • [37] Local stability and Hopf bifurcation analysis of a Rate Control Protocol with two delays
    Abuthahir
    Raina, Gaurav
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 3111 - 3116
  • [38] Exploratory data analysis with data desk
    Theus, M
    COMPUTATIONAL STATISTICS, 1998, 13 (01) : 101 - 115
  • [39] Supporting Exploratory Hypothesis Testing and Analysis
    Liu, Guimei
    Zhang, Haojun
    Feng, Mengling
    Wong, Limsoon
    Ng, See-Kiong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2015, 9 (04) : 1 - 24
  • [40] Towards Exploratory Hypothesis Testing and Analysis
    Liu, Guimei
    Feng, Mengling
    Wang, Yue
    Wong, Limsoon
    See-Kiong Ng
    Mah, Tzia Liang
    Lee, Edmund Jon Deoon
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 745 - 756