Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

被引:0
|
作者
Dhruv Sharma
Christopher Willy
John Bischoff
机构
[1] George Washington University,
[2] George Washington University,undefined
[3] George Washington University,undefined
来源
关键词
Analytics; Evolutionary computing; Swarm optimization; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
We suggest and evaluate a method for optimal construction of synthetic treatment and control samples for the purpose of drawing causal inference. The balance optimization subset selection problem, which formulates minimization of aggregate imbalance in covariate distributions to reduce bias in data, is a new area of study in operations research. We investigate a novel metric, cross-validated area under the receiver operating characteristic curve (AUC) as a measure of balance between treatment and control groups. The proposed approach provides direct and automatic balancing of covariate distributions. In addition, the AUC-based approach is able to detect subtler distributional differences than existing measures, such as simple empirical mean/variance and count-based metrics. Thus, optimizing AUCs achieves a greater balance than the existing methods. Using 5 widely used real data sets and 7 synthetic data sets, we show that optimization of samples using existing methods (Chi-square, mean variance differences, Kolmogorov–Smirnov, and Mahalanobis) results in samples containing imbalance that is detectable using machine learning ensembles. We minimize covariate imbalance by minimizing the absolute value of the distance of the maximum cross-validated AUC on M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ M $$\end{document} folds from 0.50, using evolutionary optimization. We demonstrate that particle swarm optimization (PSO) outperforms modified cuckoo swarm (MCS) for a gradient-free, non-linear noisy cost function. To compute AUCs, we use supervised binary classification approaches from the machine learning and credit scoring literature. Using superscore ensembles adds to the classifier-based two-sample testing literature. If the mean cross-validated AUC based on machine learning is 0.50, the two groups are indistinguishable and suitable for causal inference.
引用
收藏
页码:41 / 59
页数:18
相关论文
共 50 条
  • [1] Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization
    Sharma, Dhruv
    Willy, Christopher
    Bischoff, John
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (01) : 41 - 59
  • [2] Feature Subset Selection for Clustering using Binary Particle Swarm Optimization
    Dastider, Surjodoy Ghosh
    Kashyap, Himanshu
    Mandal, Shashwata
    Ghosh, Abhinandan
    Goswami, Saptarsi
    [J]. 2015 14TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2015), 2015, : 159 - 164
  • [3] Optimal Maneuvers Around Binary Asteroids Using Particle Swarm Optimization and Machine Learning
    D'Ambrosio, Andrea
    Carbone, Andrea
    Curti, Fabio
    [J]. JOURNAL OF SPACECRAFT AND ROCKETS, 2023, 60 (05) : 1458 - 1472
  • [4] Feature subset selection for face detection using genetic algorithms and particle swarm optimization
    Shoorehdeli, Mahdi Aliyari
    Teshnehlab, Mohammad
    Moghaddam, H. Abrishami
    [J]. PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL, 2006, : 686 - 690
  • [5] A Roadmap for Using Causal Inference and Machine Learning to Personalize Asthma Medication Selection
    Nkoy, Flory L.
    Stone, Bryan L.
    Zhang, Yue
    Luo, Gang
    [J]. JMIR MEDICAL INFORMATICS, 2024, 12
  • [6] Feature Subset Selection by Particle Swarm Optimization with Fuzzy Fitness Function
    Chakraborty, Basabi
    [J]. 2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 1038 - 1042
  • [7] Binary Particle Swarm Optimization based Algorithm for Feature Subset Selection
    Chakraborty, Basabi
    [J]. ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 145 - 148
  • [8] Optimal Feature Subset Selection for Fuzzy Extreme Learning Machine using Genetic Algorithm with Multilevel Parameter Optimization
    Kale, Archana
    Sonavane, Shefali
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS (ICSIPA), 2017, : 445 - 450
  • [9] Optimal selection of ensemble classifiers using particle swarm optimization and diversity measures
    Hasanpour, Hesam
    Meibodi, Ramak Ghavamizadeh
    Navi, Keivan
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (01): : 131 - 137
  • [10] Feature Subset Selection Using Binary Quantum Particle Swarm Optimization for Spam Detection System
    Behjat, Amir Rajabi
    Mustapha, Aida
    Nezamabadi-Pour, Hossein
    Sulaiman, Md Nasir
    Mustapha, Norwati
    [J]. ADVANCED SCIENCE LETTERS, 2014, 20 (01) : 188 - 192