Data Sculpting: Interpretable Algorithm for End-to-End Cohort Selection

被引:0
|
作者
Liu, Ruishan [1 ]
Zou, James [2 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
关键词
cohort selection; odds ratio; logistic regression;
D O I
10.1109/IEEECONF56349.2022.10052001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many scientific and medical analysis involves fitting a parametric model over a heterogeneous data set. The model is often chosen to be low capacity (e.g. logistic regression) in order to make statistical inference about the association between each feature and the outcome (e.g. odds ratio). However the simple model often cannot capture the heterogeneity in the data. For example, a subset of the data might follow a clean logistic relation, but other data points could follow different relations so that the fitting a logistic regression over the entire set may not find any association. In this paper, we propose a novel algorithm, Data Sculpting, for simultaneously learning to select a subset of the data while fitting the desired parametric model on the selected cohort. Data Sculpting retains the statistical inference convenience of the original model, while leveraging end-to-end differentiable optimization (via the Concrete selector) to learn interpretable rules for selecting the cohort. Extensive experiments demonstrate that Data Sculpting is efficient, robust and substantially improves over the standard approaches.
引用
收藏
页码:263 / 270
页数:8
相关论文
共 50 条
  • [41] Sequential neural networks for noetic end-to-end response selection
    Chen, Qian
    Wang, Wen
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 62
  • [42] Research on End-to-end Forwarding Path Selection Method for CFN
    Yuan, Bo
    Li, Hongtao
    Yu, Haisheng
    [J]. 2024 4TH INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING, ICICSE 2024, 2024, : 136 - 140
  • [43] Analysis on End-to-End Node Selection Probability in Tor Network
    Dahal, Saurav
    Lee, Junghee
    Kang, Jungmin
    Shin, Seokjoo
    [J]. 2015 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN), 2015, : 46 - 50
  • [44] SoCube: an innovative end-to-end doublet detection algorithm for analyzing scRNA-seq data
    Zhang, Hongning
    Lu, Mingkun
    Lin, Gaole
    Zheng, Lingyan
    Zhang, Wei
    Xu, Zhijian
    Zhu, Feng
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [45] BROADTALK - END-TO-END COMMUNICATIONS WITH DATA-BROADCASTING
    BARRAT, J
    [J]. COMPUTER COMMUNICATIONS, 1991, 14 (01) : 53 - 54
  • [46] Constructing end-to-end models using ECOPATH data
    Steele, John H.
    Ruzicka, James J.
    [J]. JOURNAL OF MARINE SYSTEMS, 2011, 87 (3-4) : 227 - 238
  • [47] End-to-End Scientific Data Management using Workflows
    Simmhan, Yogesh
    [J]. IEEE CONGRESS ON SERVICES 2008, PT I, PROCEEDINGS, 2008, : 472 - 473
  • [48] An End-to-End Secure Solution for IoMT Data Exchange
    El Jaouhari, Saad
    Tamani, Nouredine
    [J]. APPLIED CRYPTOGRAPHY AND NETWORK SECURITY WORKSHOPS, PT I, ACNS 2024-AIBLOCK 2024, AIHWS 2024, AIOTS 2024, SCI 2024, AAC 2024, SIMLA 2024, LLE 2024, AND CIMSS 2024, 2024, 14586 : 3 - 15
  • [49] End-to-End Data Paths: Quickest or Most Reliable?
    Xue, Guoliang
    [J]. IEEE COMMUNICATIONS LETTERS, 1998, 2 (06) : 156 - 158
  • [50] Demonstration of End-to-End Automation of DNA Data Storage
    Takahashi, Christopher N.
    Nguyen, Bichlien H.
    Strauss, Karin
    Ceze, Luis
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)