A TREE-BASED APPROACH FOR ADDRESSING SELF-SELECTION IN IMPACT STUDIES WITH BIG DATA

被引:20
|
作者
Yahav, Inbal [1 ]
Shmueli, Galit [2 ]
Mani, Deepa [3 ]
机构
[1] Bar Ilan Univ, Grad Sch Business Adm, Dept Informat Syst, IL-52900 Ramat Gan, Israel
[2] Natl Tsing Hua Univ, Coll Technol Management, Inst Serv Sci, Hsinchu 30013, Taiwan
[3] Indian Sch Business, Hyderabad 500032, Andhra Pradesh, India
关键词
Self-selection; classification and regression trees; intervention; decision-making; e-governance; outsourcing; analytics; PROPENSITY SCORE ESTIMATION; ESTIMATION NEURAL-NETWORKS; MATCHING METHODS; REGRESSION; ALTERNATIVES; CONTRACTS; INFERENCE; PRIVACY; MARKET; BIAS;
D O I
10.25300/MISQ/2016/40.4.02
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we introduce a tree-based approach adjusting for observable self-selection bias in intervention studies in management research. In contrast to traditional propensity score (PS) matching methods, including those using classification trees as a subcomponent, our tree-based approach provides a standalone, automated, data-driven methodology that allows for (1) the examination of nascent interventions whose selection is difficult and costly to theoretically specify a priori, (2) detection of heterogeneous intervention effects for different pre-intervention profiles, (3) identification of pre-intervention variables that correlate with the self-selected intervention, and (4) visual presentation of intervention effects that is easy to discern and understand. As such, the tree-based approach is a useful tool for analyzing observational impact studies as well as for post-analysis of experimental data. The tree-based approach is particularly advantageous in the analyses of big data, or data with large sample sizes and a large number of variables. It outperforms PS in terms of computational time, data loss, and automatic capture of nonlinear relationships and heterogeneous interventions. It also requires less user specification and choices than PS, reducing potential data dredging. We discuss the performance of our method in the context of such big data and present results for very large simulated samples with many variables. We illustrate the method and the insights it yields in the context of three impact studies with different study designs: reanalysis of a field study on the effect of training on earnings, analysis of the impact of an electronic governance service in India based on a quasi-experiment, and performance comparison of contract pricing mechanisms and durations in IT outsourcing using observational data.
引用
收藏
页码:819 / +
页数:39
相关论文
共 50 条
  • [1] Differentially Private Tree-Based Contextual Online Learning for Service Big Data Selection in IoT
    Zhao, Weiguang
    Chen, Mingxuan
    Mu, Difan
    Zhou, Pan
    Wang, Kehao
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [2] A Tree-Based Approach to Data Flow Proofs
    Hoenicke, Jochen
    Nutz, Alexander
    Podelski, Andreas
    [J]. VERIFIED SOFTWARE: THEORIES, TOOLS, AND EXPERIMENTS, (VSTTE 2018), 2018, 11294 : 1 - 16
  • [3] Tree-based Approach to Missing Data Imputation
    Vateekul, Peerapon
    Sarinnapakorn, Kanoksri
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 70 - +
  • [4] Wart Treatment Selection with a Decision Tree-Based Approach
    Yanik, Huseyin
    Comert, Mustafa
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [5] SELF-SELECTION BIASES IN CORRELATIONAL STUDIES BASED ON QUESTIONNAIRES
    BOWDEN, RJ
    [J]. PSYCHOMETRIKA, 1986, 51 (02) : 313 - 325
  • [6] An Efficient Tree-based Fuzzy Data Mining Approach
    Lin, Chun-Wei
    Hong, Tzung-Pei
    Lu, Wen-Hsiang
    [J]. INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2010, 12 (02) : 150 - 157
  • [7] Vertical handoff research based on cognitive self-selection decision tree
    Fan, Cun-Qun
    Wang, Shang-Guang
    Sun, Qi-Bo
    Zou, Hua
    Yang, Fang-Chun
    [J]. Tongxin Xuebao/Journal on Communications, 2013, 34 (11): : 58 - 80
  • [8] The predictability of tree-based machine learning algorithms in the big data context
    Qolipour F.
    Ghasemzadeh M.
    Mohammad-Karimi N.
    [J]. International Journal of Engineering, Transactions A: Basics, 2021, 34 (01): : 82 - 89
  • [9] The Predictability of Tree-based Machine Learning Algorithms in the Big Data Context
    Qolipour, F.
    Ghasemzadeh, M.
    Mohammad-Karimi, N.
    [J]. INTERNATIONAL JOURNAL OF ENGINEERING, 2021, 34 (01): : 82 - 89
  • [10] A tree-based algorithm for attribute selection
    José Augusto Baranauskas
    Oscar Picchi Netto
    Sérgio Ricardo Nozawa
    Alessandra Alaniz Macedo
    [J]. Applied Intelligence, 2018, 48 : 821 - 833