Towards Exploratory Hypothesis Testing and Analysis

被引:0
|
作者
Liu, Guimei [1 ]
Feng, Mengling [2 ]
Wang, Yue [3 ]
Wong, Limsoon [1 ]
See-Kiong Ng [2 ]
Mah, Tzia Liang [2 ]
Lee, Edmund Jon Deoon [4 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore 117548, Singapore
[2] Inst Infocomm Res, Data Min Dept, Singapore, Singapore
[3] Natl Univ Singapore, Gradute Sch Integrat Sci & Engn, Singapore, Singapore
[4] Natl Univ Singapore, Dept Pharmacol, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hypothesis testing is a well-established tool for scientific discovery. Conventional hypothesis testing is carried out in a hypothesis-driven manner. A scientist must first formulate a hypothesis based on his/her knowledge and experience, and then devise a variety of experiments to test it. Given the rapid growth of data, it has become virtually impossible for a person to manually inspect all the data to find all the interesting hypotheses for testing. In this paper, we propose and develop a data-driven system for automatic hypothesis testing and analysis. We define a hypothesis as a comparison between two or more sub-populations. We find sub-populations for comparison using frequent pattern mining techniques and then pair them up for statistical testing. We also generate additional information for further analysis of the hypotheses that are deemed significant. We conducted a set of experiments to show the efficiency of the proposed algorithms, and the usefulness of the generated hypotheses. The results show that our system can help users (1) identify significant hypotheses; (2) isolate the reasons behind significant hypotheses; and (3) find confounding factors that form Simpson's Paradoxes with discovered significant hypotheses.
引用
收藏
页码:745 / 756
页数:12
相关论文
共 50 条
  • [1] Supporting Exploratory Hypothesis Testing and Analysis
    Liu, Guimei
    Zhang, Haojun
    Feng, Mengling
    Wong, Limsoon
    Ng, See-Kiong
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2015, 9 (04) : 1 - 24
  • [2] A fuzzy representation of random variables:: An operational tool in exploratory analysis and hypothesis testing
    Gonzalez-Rodriguez, Gil
    Colubi, Ana
    Angeles Gil, Maria
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (01) : 163 - 176
  • [3] Towards a hybrid testing process unifying exploratory testing and scripted testing
    Shah, Syed Muhammad Ali
    Gencel, Cigdem
    Alvi, Usman Sattar
    Petersen, Kai
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2014, 26 (02) : 220 - 250
  • [4] TOWARDS TESTING THE DUTCH HYPOTHESIS FROM CHILDHOOD
    ANNESI, I
    KAUFFMANN, F
    [J]. EUROPEAN RESPIRATORY JOURNAL, 1993, 6 (07) : 930 - 931
  • [5] Rotation Criteria and Hypothesis Testing for Exploratory Factor Analysis: Implications for Factor Pattern Loadings and Interfactor Correlations
    Schmitt, Thomas A.
    Sass, Daniel A.
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2011, 71 (01) : 95 - 113
  • [6] Exploratory text data analysis for quality hypothesis generation
    Allen, Theodore T.
    Sui, Zhenhuan
    Akbari, Kaveh
    [J]. QUALITY ENGINEERING, 2018, 30 (04) : 701 - 712
  • [7] Towards multiple hypothesis situation analysis
    Roy, Jean
    [J]. 2007 PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2007, : 497 - 504
  • [8] Hypothesis testing fuzzy regression analysis
    Grigoriev, AA
    Popov, AA
    [J]. 2002 6TH INTERNATIONAL CONFERENCE ON ACTUAL PROBLEMS OF ELECTRONIC INSTRUMENT ENGINEERING PROCEEDINGS, VOL 1, 2002, : 230 - 232
  • [9] FORMULATION, ANALYSIS AND TESTING OF THE INTERACTANCE HYPOTHESIS
    Cavanaugh, Joseph A.
    [J]. AMERICAN SOCIOLOGICAL REVIEW, 1950, 15 (06) : 763 - 766
  • [10] THE SIGNIFICANCE OF HYPOTHESIS TESTING IN DEFORMATION ANALYSIS
    Savsek, Simona
    [J]. GEODETSKI VESTNIK, 2013, 57 (03) : 465 - 478