sparsesurv: a Python']Python package for fitting sparse survival models via knowledge distillation

被引:0
|
作者
Wissel, David [1 ,2 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Schulte, Julius [1 ]
Rowson, Daniel [1 ,3 ]
Yuan, Xintian [1 ]
Boeva, Valentina [1 ,3 ,5 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[2] Univ Zurich, Dept Mol Life Sci, CH-8057 Zurich, Switzerland
[3] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[4] IBM Res Europe, CH-8803 Zurich, Switzerland
[5] Univ Paris, Inst Cochin, UMR S1016, S1016, F-75014 Paris, France
基金
瑞士国家科学基金会;
关键词
REGULARIZATION PATHS; EFFICIENT ESTIMATION; REGRESSION; SELECTION; LASSO;
D O I
10.1093/bioinformatics/btae521
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter.Results In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher-student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation.Availability and implementation sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).
引用
收藏
页数:5
相关论文
共 50 条
  • [31] xlogit: An open-source Python']Python package for GPU-accelerated estimation of Mixed Logit models
    Arteaga, Cristian
    Park, JeeWoong
    Beeramoole, Prithvi Bhat
    Paz, Alexander
    JOURNAL OF CHOICE MODELLING, 2022, 42
  • [32] Computer Programs Physics pyMCD: Python']Python package for searching transition states via the multicoordinate driven method
    Lee, Kyunghoon
    Kim, Jun Hyeong
    Kim, Woo Youn
    COMPUTER PHYSICS COMMUNICATIONS, 2023, 291
  • [33] AlphaMap: an open-source Python']Python package for the visual annotation of proteomics data with sequence-specific knowledge
    Voytik, Eugenia
    Bludau, Isabell
    Willems, Sander
    Hansen, Fynn M.
    Brunner, Andreas-David
    Strauss, Maximilian T.
    Mann, Matthias
    BIOINFORMATICS, 2022, 38 (03) : 849 - 852
  • [34] py-fmas: A python']python package for ultrashort optical pulse propagation in terms of forward models for the analytic signal
    Melchert, O.
    Demircan, A.
    COMPUTER PHYSICS COMMUNICATIONS, 2022, 273
  • [35] PySysML2: Building Knowledge from Models with SysML v2 and Python']Python
    Lucas, Keith L.
    Ford, Thomas C.
    Stern, Jordan L.
    Situ, John X.
    PROCEEDINGS OF THE 2023 CONFERENCE ON SYSTEMS ENGINEERING RESEARCH, CSER 2023, 2024, : 3 - 17
  • [36] pySAPC, a python']python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data
    Cao, Huojun
    Amendt, Brad A.
    BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 2016, 1860 (11): : 2613 - 2618
  • [37] PSLSA v2.0: An automatic Python']Python package integrating machine learning models for regional landslide susceptibility assessment
    Guo, Zizheng
    Wang, Haojie
    He, Jun
    Huang, Da
    Song, Yixiang
    Wang, Tengfei
    Liu, Yuanbo
    Ferrer, Joaquin V.
    ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 186
  • [38] GeoFabrics 1.0.0: An open-source Python']Python package for automatic hydrological conditioning of digital elevation models for flood modelling
    Pearson, Rose A.
    Smart, Graeme
    Wilkins, Matt
    Lane, Emily
    Harang, Alice
    Bosserelle, Cyprien
    Cattoe, Celine
    Measures, Richard
    ENVIRONMENTAL MODELLING & SOFTWARE, 2023, 170
  • [39] Noisecut: a python']python package for noise-tolerant classification of binary data using prior knowledge integration and max-cut solutions
    Samadi, Moein E.
    Mirzaieazar, Hedieh
    Mitsos, Alexander
    Schuppert, Andreas
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [40] py2PowerDEVS: CONSTRUCTION AND MANIPULATION OF LARGE COMPLEX STRUCTURES FOR PowerDEVS MODELS VIA PYTHON']PYTHON SCRIPTING
    Pecker-Marcosig, Ezequiel
    Bonaventura, Matias
    Lanzarotti, Esteban
    Santi, Lucio
    Castro, Rodrigo
    2022 WINTER SIMULATION CONFERENCE (WSC), 2022, : 2594 - 2605