HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines

被引:2
|
作者
Gao, Mingjie [1 ]
Guenther, Stefan [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Inst Pharmaceut Sci, Hermann Herder Str 9, D-79104 Freiburg, Germany
关键词
machine learning; structure and sequence based; druggable cysteine; reactivity prediction; WEB SERVER; PROTEIN; GENERATION;
D O I
10.3390/ijms24065960
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein-ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines' classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] On the use of structure and sequence-based features for protein classification and retrieval
    Marsolo, Keith
    Parthasarathy, Srinivasan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 14 (01) : 59 - 80
  • [42] SCALOP: sequence-based antibody canonical loop structure annotation
    Wong, Wing Ki
    Georges, Guy
    Ros, Francesca
    Kelm, Sebastian
    Lewis, Alan P.
    Taddese, Bruck
    Leem, Jinwoo
    Deane, Charlotte M.
    BIOINFORMATICS, 2019, 35 (10) : 1774 - 1776
  • [43] A Sequence-Based Predictor of Zika Virus Proteins Developed by Integration of PseAAC and Statistical Moments
    Hussain, Waqar
    Rasool, Nouman
    Khan, Yaser D.
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2020, 23 (08) : 797 - 804
  • [44] POSEIDON: Peptidic Objects SEquence-based Interaction with cellular DOmaiNs: a new database and predictor
    António J. Preto
    Ana B. Caniceiro
    Francisco Duarte
    Hugo Fernandes
    Lino Ferreira
    Joana Mourão
    Irina S. Moreira
    Journal of Cheminformatics, 16
  • [45] iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties
    Chen, Wei
    Lin, Hao
    Feng, Peng-Mian
    Ding, Chen
    Zuo, Yong-Chun
    Chou, Kuo-Chen
    PLOS ONE, 2012, 7 (10):
  • [46] Using a Parallel Ensemble of Sequence-Based Selection Hyper-Heuristics for Electric Bus Scheduling
    Chitty, Darren M.
    Lewis, James
    Keedwell, Ed
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1712 - 1720
  • [47] Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network
    Du, Hongyan
    Jiang, Dejun
    Gao, Junbo
    Zhang, Xujun
    Jiang, Lingxiao
    Zeng, Yundian
    Wu, Zhenxing
    Shen, Chao
    Xu, Lei
    Cao, Dongsheng
    Hou, Tingjun
    Pan, Peichen
    RESEARCH, 2022, 2022
  • [48] Protein multiple alignments: sequence-based versus structure-based programs
    Carpentier, Mathilde
    Chomilier, Jacques
    BIOINFORMATICS, 2019, 35 (20) : 3970 - 3980
  • [49] CPPred-RF: A Sequence-based Predictor for Identifying Cell Penetrating Peptides and Their Uptake Efficiency
    Wei, Leyi
    Xing, PengWei
    Su, Ran
    Shi, Gaotao
    Ma, Zhanshan Sam
    Zou, Quan
    JOURNAL OF PROTEOME RESEARCH, 2017, 16 (05) : 2044 - 2053
  • [50] Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature
    Qiu, Sizhe
    Hu, Bozhen
    Zhao, Jing
    Xu, Weiren
    Yang, Aidong
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (02)