A TREE-BASED APPROACH FOR ADDRESSING SELF-SELECTION IN IMPACT STUDIES WITH BIG DATA

被引:20
|
作者
Yahav, Inbal [1 ]
Shmueli, Galit [2 ]
Mani, Deepa [3 ]
机构
[1] Bar Ilan Univ, Grad Sch Business Adm, Dept Informat Syst, IL-52900 Ramat Gan, Israel
[2] Natl Tsing Hua Univ, Coll Technol Management, Inst Serv Sci, Hsinchu 30013, Taiwan
[3] Indian Sch Business, Hyderabad 500032, Andhra Pradesh, India
关键词
Self-selection; classification and regression trees; intervention; decision-making; e-governance; outsourcing; analytics; PROPENSITY SCORE ESTIMATION; ESTIMATION NEURAL-NETWORKS; MATCHING METHODS; REGRESSION; ALTERNATIVES; CONTRACTS; INFERENCE; PRIVACY; MARKET; BIAS;
D O I
10.25300/MISQ/2016/40.4.02
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we introduce a tree-based approach adjusting for observable self-selection bias in intervention studies in management research. In contrast to traditional propensity score (PS) matching methods, including those using classification trees as a subcomponent, our tree-based approach provides a standalone, automated, data-driven methodology that allows for (1) the examination of nascent interventions whose selection is difficult and costly to theoretically specify a priori, (2) detection of heterogeneous intervention effects for different pre-intervention profiles, (3) identification of pre-intervention variables that correlate with the self-selected intervention, and (4) visual presentation of intervention effects that is easy to discern and understand. As such, the tree-based approach is a useful tool for analyzing observational impact studies as well as for post-analysis of experimental data. The tree-based approach is particularly advantageous in the analyses of big data, or data with large sample sizes and a large number of variables. It outperforms PS in terms of computational time, data loss, and automatic capture of nonlinear relationships and heterogeneous interventions. It also requires less user specification and choices than PS, reducing potential data dredging. We discuss the performance of our method in the context of such big data and present results for very large simulated samples with many variables. We illustrate the method and the insights it yields in the context of three impact studies with different study designs: reanalysis of a field study on the effect of training on earnings, analysis of the impact of an electronic governance service in India based on a quasi-experiment, and performance comparison of contract pricing mechanisms and durations in IT outsourcing using observational data.
引用
收藏
页码:819 / +
页数:39
相关论文
共 50 条
  • [21] A tree-based approach for frequent pattern mining from uncertain data
    Leung, Carson Kai-Sang
    Mateo, Mark Anthony F.
    Brajczuk, Dale A.
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 653 - 661
  • [22] Tree-based boosting with functional data
    Ju, Xiaomeng
    Salibian-Barrera, Matias
    [J]. COMPUTATIONAL STATISTICS, 2024, 39 (03) : 1587 - 1620
  • [23] Tree-based boosting with functional data
    Xiaomeng Ju
    Matías Salibián-Barrera
    [J]. Computational Statistics, 2024, 39 : 1587 - 1620
  • [24] TREE-BASED WAVELETS FOR IMAGE CODING: ORTHOGONALIZATION AND TREE SELECTION
    Shen, Godwin
    Ortega, Antonio
    [J]. PCS: 2009 PICTURE CODING SYMPOSIUM, 2009, : 265 - 268
  • [25] Tree-Based Models for Correlated Data
    Rabinowicz, Assaf
    Rosset, Saharon
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [26] Tree-Based Models for Correlated Data
    Rabinowicz, Assaf
    Rosset, Saharon
    [J]. Journal of Machine Learning Research, 2022, 23
  • [27] A population-based investigation of participation rate and self-selection bias in momentary data capture and survey studies
    Stone, Arthur A. A.
    Schneider, Stefan
    Smyth, Joshua M. M.
    Junghaenel, Doerte U. U.
    Couper, Mick P. P.
    Wen, Cheng
    Mendez, Marilyn
    Velasco, Sarah
    Goldstein, Sarah
    [J]. CURRENT PSYCHOLOGY, 2024, 43 (03) : 2074 - 2090
  • [28] A population-based investigation of participation rate and self-selection bias in momentary data capture and survey studies
    Arthur A. Stone
    Stefan Schneider
    Joshua M. Smyth
    Doerte U. Junghaenel
    Mick P. Couper
    Cheng Wen
    Marilyn Mendez
    Sarah Velasco
    Sarah Goldstein
    [J]. Current Psychology, 2024, 43 : 2074 - 2090
  • [29] A Simple yet Effective Data Integration Approach to Tree-Based Microarray Data Classification
    Liu, Lin
    Li, Yi
    Liu, Bing
    Li, Jiuyong
    [J]. 2010 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2010, : 1503 - 1506
  • [30] Stereotypes and tournament self-selection: A theoretical and experimental approach
    Hernandez-Arenaz, Inigo
    [J]. EUROPEAN ECONOMIC REVIEW, 2020, 126