sparsesurv: a Python']Python package for fitting sparse survival models via knowledge distillation

被引:0
|
作者
Wissel, David [1 ,2 ,3 ]
Janakarajan, Nikita [1 ,4 ]
Schulte, Julius [1 ]
Rowson, Daniel [1 ,3 ]
Yuan, Xintian [1 ]
Boeva, Valentina [1 ,3 ,5 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[2] Univ Zurich, Dept Mol Life Sci, CH-8057 Zurich, Switzerland
[3] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[4] IBM Res Europe, CH-8803 Zurich, Switzerland
[5] Univ Paris, Inst Cochin, UMR S1016, S1016, F-75014 Paris, France
基金
瑞士国家科学基金会;
关键词
REGULARIZATION PATHS; EFFICIENT ESTIMATION; REGRESSION; SELECTION; LASSO;
D O I
10.1093/bioinformatics/btae521
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter.Results In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher-student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation.Availability and implementation sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).
引用
收藏
页数:5
相关论文
共 50 条
  • [41] EFMlrs: a Python package for elementary flux mode enumeration via lexicographic reverse search
    Bianca A Buchner
    Jürgen Zanghellini
    BMC Bioinformatics, 22
  • [42] Sparse Mixture of Experts Language Models Excel in Knowledge Distillation
    Xu, Haiyang
    Liu, Haoxiang
    Gong, Wei
    Wang, Hai
    Deng, Xianjun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 80 - 91
  • [43] GWSim : Python']Python package for creating mock GW samples for different astrophysical populations and cosmological models of binary black holes( vol 677, A124, 2023)
    Karathanasis, Christos
    Revenu, Benoit
    Mukherjee, Suvodip
    Stachurski, Federico
    ASTRONOMY & ASTROPHYSICS, 2024, 682
  • [44] Temperature-dependent bandgap of (In,Ga)As via P5Grand: A Python']Python Package for Property Prediction of Pseudobinary systems using Grand canonical ensemble
    Han, Gyuseung
    Yeu, In Won
    Ye, Kun Hee
    Yoon, Seungjae
    Jeong, Taeyoung
    Lee, Seung-Cheol
    Hwang, Cheol Seong
    Choi, Jung-Hae
    CHEMICAL PHYSICS LETTERS, 2022, 804
  • [45] Package Arrival Time Prediction via Knowledge Distillation Graph Neural Network
    Zhang, Lei
    Liu, Yong
    Zeng, Zhiwei
    Cao, Yiming
    Wu, Xingyu
    Xu, Yonghui
    Shen, Zhiqi
    Cui, Lizhen
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (05)
  • [46] Fitting Accelerated Failure Time Models in Routine Survival Analysis with R Package aftgee
    Chiou, Sy Han
    Kang, Sangwook
    Yan, Jun
    JOURNAL OF STATISTICAL SOFTWARE, 2014, 61 (11): : 1 - 23
  • [47] GWSim: Python package for creating mock GW samples for different astrophysical populations and cosmological models of binary black holes
    Karathanasis C.
    Revenu B.
    Mukherjee S.
    Stachurski F.
    Astronomy and Astrophysics, 2023, 677
  • [48] ngspatial: A Package for Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data
    Hughes, John
    R JOURNAL, 2014, 6 (02): : 81 - 95
  • [49] dalmatian: A Package for Fitting Double Hierarchical Linear Models in R via JAGS and nimble
    Bonner, Simon
    Kim, Han-Na
    Westneat, David
    Mutzel, Ariane
    Wright, Jonathan
    Schofield, Matthew
    JOURNAL OF STATISTICAL SOFTWARE, 2021, 100 (10): : 1 - 25
  • [50] Development of GIS models via optical programming and python']python scripts to implement four empirical methods of reference and actual evapotranspiration (ETo, ETa) incorporating MODIS LST inputs
    Dimitriadou, Stavroula
    Nikolakopoulos, Konstantinos G.
    REMOTE SENSING FOR AGRICULTURE, ECOSYSTEMS, AND HYDROLOGY XXIII, 2021, 11856