sparsesurv: a Python']Python package for fitting sparse survival models via knowledge distillation

被引:0
|
作者
Wissel, David [1 ,2 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Schulte, Julius [1 ]
Rowson, Daniel [1 ,3 ]
Yuan, Xintian [1 ]
Boeva, Valentina [1 ,3 ,5 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[2] Univ Zurich, Dept Mol Life Sci, CH-8057 Zurich, Switzerland
[3] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[4] IBM Res Europe, CH-8803 Zurich, Switzerland
[5] Univ Paris, Inst Cochin, UMR S1016, S1016, F-75014 Paris, France
基金
瑞士国家科学基金会;
关键词
REGULARIZATION PATHS; EFFICIENT ESTIMATION; REGRESSION; SELECTION; LASSO;
D O I
10.1093/bioinformatics/btae521
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter.Results In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher-student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation.Availability and implementation sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Using the COAsT Python']Python package to develop a standardised validation workflow for ocean physics models
    Byrne, David
    Polton, Jeff
    O'Dea, Enda
    Williams, Joanne
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2023, 16 (13) : 3749 - 3764
  • [22] Modelly: An open source all in one python']python package for developing machine learning models
    Sarkar, Tushar
    Shah, Disha
    SOFTWARE IMPACTS, 2022, 14
  • [23] ECMpy 2.0: A Python']Python package for automated construction and analysis of enzyme-constrained models
    Mao, Zhitao
    Niu, Jinhui
    Zhao, Jianxiao
    Huang, Yuanyuan
    Wu, Ke
    Yun, Liyuan
    Guan, Jirun
    Yuan, Qianqian
    Liao, Xiaoping
    Wang, Zhiwen
    Ma, Hongwu
    SYNTHETIC AND SYSTEMS BIOTECHNOLOGY, 2024, 9 (03) : 494 - 502
  • [24] Latte: Cross-framework Python']Python package for evaluation of latent-based generative models
    Watcharasupat, Karn N.
    Lee, Junyoung
    Lerch, Alexander
    SOFTWARE IMPACTS, 2022, 11
  • [25] jmcm: a Python']Python package for analyzing longitudinal data using joint mean-covariance models
    Yang, Xuerui
    Pan, Jianxin
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (11) : 5446 - 5461
  • [26] LLM-IE: a python']python package for biomedical generative information extraction with large language models
    Hsu, Enshuo
    Roberts, Kirk
    JAMIA OPEN, 2025, 8 (02)
  • [27] PsychRNN: An Accessible and Flexible Python']Python Package for Training Recurrent Neural Network Models on Cognitive Tasks
    Ehrlich, Daniel B.
    Stone, Jasmine T.
    Brandfonbrener, David
    Atanasov, Alexander
    Murray, John D.
    ENEURO, 2021, 8 (01) : 1 - 11
  • [28] PYKT: A Python']Python Library to Benchmark Deep Learning based Knowledge Tracing Models
    Liu, Zitao
    Liu, Qiongqiong
    Chen, Jiahao
    Huang, Shuyan
    Tang, Jiliang
    Luo, Weiqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] pyCOFBuilder: A Python']Python Package for Automated Creation of Covalent Organic Framework Models Based on the Reticular Approach
    Oliveira, Felipe L.
    Esteves, Pierre M.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (08) : 3278 - 3289
  • [30] NLMpy: a PYTHON']PYTHON software package for the creation of neutral landscape models within a general numerical framework
    Etherington, Thomas R.
    Holland, E. Penelope
    O'Sullivan, David
    METHODS IN ECOLOGY AND EVOLUTION, 2015, 6 (02): : 164 - 168