StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

被引:2
|
作者
Charoenkwan, Phasit [1 ]
Schaduangrat, Nalini [2 ]
Shoombuatong, Watshara [2 ]
机构
[1] Chiang Mai Univ, Coll Arts Media & Technol, Modern Management & Informat Technol, Chiang Mai 50200, Thailand
[2] Mahidol Univ, Fac Med Technol, Ctr Res Innovat & Biomed Informat, Bangkok 10700, Thailand
关键词
T-cell antigen; Bioinformatics; Stacking strategy; Feature selection; Machine learning; PREDICTION;
D O I
10.1186/s12859-023-05421-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background : The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision.Results : In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866.Conclusions : In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server () to maximize user convenience for high-throughput screening of novel TTCAs.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
    Phasit Charoenkwan
    Nalini Schaduangrat
    Watshara Shoombuatong
    BMC Bioinformatics, 24
  • [2] Proteins as T cell antigens: Methods for high-throughput identification
    Grubaugh, Daniel
    Flechtner, Jessica Baker
    Higgins, Darren E.
    VACCINE, 2013, 31 (37) : 3805 - 3810
  • [3] CRISPRCasStack: a stacking strategy-based ensemble learning framework for accurate identification of Cas proteins
    Zhang, Tianjiao
    Jia, Yuran
    Li, Hongfei
    Xu, Dali
    Zhou, Jie
    Wang, Guohua
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [4] Stacking based ensemble learning framework for identification of nitrotyrosine sites
    Parvez, Aiman
    Ali, Syed Danish
    Tayara, Hilal
    Chong, Kil To
    Computers in Biology and Medicine, 2024, 183
  • [5] Stacking Ensemble Learning-based Gender Identification for User Profiling in Smart Education
    Fu, Qiang
    Wen, Yiping
    Tan, Zheng
    Fu, Qi
    IEEE TALE2021: IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND EDUCATION, 2021, : 986 - 991
  • [6] A deep learning-based multivariate decomposition and ensemble framework for container throughput forecasting
    Kulshrestha, Anurag
    Yadav, Abhishek
    Sharma, Himanshu
    Suman, Shikha
    JOURNAL OF FORECASTING, 2024, 43 (07) : 2685 - 2704
  • [7] SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins
    Charoenkwan, Phasit
    Schaduangrat, Nalini
    Moni, Mohammad Ali
    Lio, Pietro
    Manavalan, Balachandran
    Shoombuatong, Watshara
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [8] A Deep Learning-Based Approach for High-Throughput Hypocotyl Phenotyping
    Dobos, Orsolya
    Horvath, Peter
    Nagy, Ferenc
    Danka, Tivadar
    Viczian, Andras
    PLANT PHYSIOLOGY, 2019, 181 (04) : 1415 - 1424
  • [9] High-throughput analysis of T cell cytokine responses to a variety of antigens
    Inokuma, M
    Suni, MA
    Ghanekar, SA
    Gladding, D
    Maino, VC
    Maecker, HT
    Dunne, JF
    CYTOMETRY PART A, 2004, 59A (01): : 32 - 32
  • [10] A Bayesian framework for high-throughput T cell receptor pairing
    Holec, Patrick V.
    Berleant, Joseph
    Bathe, Mark
    Birnbaum, Michael E.
    BIOINFORMATICS, 2019, 35 (08) : 1318 - 1325