sparsesurv: a Python']Python package for fitting sparse survival models via knowledge distillation

被引:0
|
作者
Wissel, David [1 ,2 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Schulte, Julius [1 ]
Rowson, Daniel [1 ,3 ]
Yuan, Xintian [1 ]
Boeva, Valentina [1 ,3 ,5 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[2] Univ Zurich, Dept Mol Life Sci, CH-8057 Zurich, Switzerland
[3] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[4] IBM Res Europe, CH-8803 Zurich, Switzerland
[5] Univ Paris, Inst Cochin, UMR S1016, S1016, F-75014 Paris, France
基金
瑞士国家科学基金会;
关键词
REGULARIZATION PATHS; EFFICIENT ESTIMATION; REGRESSION; SELECTION; LASSO;
D O I
10.1093/bioinformatics/btae521
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter.Results In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher-student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation.Availability and implementation sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A Python']Python upgrade to the GooFit package for parallel fitting
    Schreiner, Henry
    Pandey, Himadri
    Sokoloff, Michael D.
    Hittle, Bradley
    Tomko, Karen
    Hasse, Christoph
    23RD INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2018), 2019, 214
  • [2] Ratingcurve: A Python']Python Package for Fitting Streamflow Rating Curves
    Hodson, Timothy O.
    Doore, Keith J.
    Kenney, Terry A.
    Over, Thomas M.
    Yeheyis, Muluken B.
    HYDROLOGY, 2024, 11 (02)
  • [3] Sherpa: An Open-source Python']Python Fitting Package
    Siemiginowska, Aneta
    Burke, Douglas
    Gunther, Hans Moritz
    Lee, Nicholas P.
    McLaughlin, Warren
    Principe, David A.
    Cheer, Harlan
    Fruscione, Antonella
    Laurino, Omar
    McDowell, Jonathan
    Terrell, Marie
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2024, 274 (02):
  • [4] RadFil: A Python']Python Package for Building and Fitting Radial Profiles for Interstellar Filaments
    Zucker, Catherine
    Chen, Hope How-Huan
    ASTROPHYSICAL JOURNAL, 2018, 864 (02):
  • [5] BARMPy: Bayesian additive regression models Python']Python package
    Van Boxel, Danielle
    COMPUTATIONAL STATISTICS, 2024,
  • [6] TrustML: A Python']Python package for computing the trustworthiness of ML models
    Manzano, Marti
    Ayala, Claudia
    Gomez, Cristina
    SOFTWAREX, 2024, 26
  • [7] PySAP: Python']Python Sparse Data Analysis Package for multidisciplinary image processing
    Farrens, S.
    Grigis, A.
    El Gueddari, L.
    Ramzi, Z.
    Chaithya, G. R.
    Starck, S.
    Sarthou, B.
    Cherkaoui, H.
    Ciuciu, P.
    Starck, J-L
    ASTRONOMY AND COMPUTING, 2020, 32
  • [8] PetroFit: A Python']Python Package for Computing Petrosian Radii and Fitting Galaxy Light Profiles
    Geda, Robel
    Crawford, Steven M.
    Hunt, Lucas
    Bershady, Matthew
    Tollerud, Erik
    Randriamampandry, Solohery
    ASTRONOMICAL JOURNAL, 2022, 163 (05):
  • [9] Bambi: A Simple Interface for Fitting Bayesian Linear Models in Python']Python
    Capretto, Tomas
    Piho, Camen
    Kumar, Ravin
    Westfall, Jacob
    Yarkoni, Tal
    Martin, Osvaldo A.
    JOURNAL OF STATISTICAL SOFTWARE, 2022, 103 (15): : 1 - 29
  • [10] yggdrasil: a Python']Python package for integrating computational models across languages and scales
    Lang, Meagan
    IN SILICO PLANTS, 2019, 1 (01):