HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines

被引:2
|
作者
Gao, Mingjie [1 ]
Guenther, Stefan [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Inst Pharmaceut Sci, Hermann Herder Str 9, D-79104 Freiburg, Germany
关键词
machine learning; structure and sequence based; druggable cysteine; reactivity prediction; WEB SERVER; PROTEIN; GENERATION;
D O I
10.3390/ijms24065960
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein-ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines' classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A genetic threading method with combined structure- and sequence-based information
    Gu, Junfeng
    Wang, Xicheng
    ADVANCES IN CHEMICAL, MATERIAL AND METALLURGICAL ENGINEERING, PTS 1-5, 2013, 634-638 : 3930 - 3935
  • [2] Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability
    Goldenzweig, Adi
    Goldsmith, Moshe
    Hill, Shannon E.
    Gertman, Or
    Laurino, Paola
    Ashani, Yacov
    Dym, Orly
    Unger, Tamar
    Albeck, Shira
    Prilusky, Jaime
    Lieberman, Raquel L.
    Aharoni, Amir
    Silman, Israel
    Sussman, Joel L.
    Tawfik, Dan S.
    Fleishman, Sarel J.
    MOLECULAR CELL, 2016, 63 (02) : 337 - 346
  • [3] Structure- And sequence-based design of synthetic single-domain antibody libraries
    Sevy A.M.
    Chen M.-T.
    Castor M.
    Sylvia T.
    Krishnamurthy H.
    Ishchenko A.
    Hsieh C.-M.
    Protein Engineering, Design and Selection, 2020, 33 : 1 - 13
  • [4] Structure- and sequence-based design of synthetic single-domain antibody libraries
    Sevy, Alexander M.
    Chen, Ming-Tang
    Castor, Michelle
    Sylvia, Tyler
    Krishnamurthy, Harini
    Ishchenko, Andrii
    Hsieh, Chung-Ming
    PROTEIN ENGINEERING DESIGN & SELECTION, 2020, 33
  • [5] Molecular basis for specificity in the druggable kinome: sequence-based analysis
    Chen, Jianping
    Zhang, Xi
    Fernandez, Ariel
    BIOINFORMATICS, 2007, 23 (05) : 563 - 572
  • [6] Discovery of a Thermostable Tagatose 4-Epimerase Powered by Structure- and Sequence-Based Protein Clustering
    Chen, JiaJun
    Ni, Dawei
    Zhu, Yingying
    Xu, Wei
    Moussa, Tarek A. A.
    Zhang, Wenli
    Mu, Wanmeng
    JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2024, 72 (33) : 18585 - 18593
  • [7] Will my protein crystallize? A sequence-based predictor
    Smialowski, P
    Schmidt, T
    Cox, J
    Kirschner, A
    Frishman, D
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 62 (02) : 343 - 355
  • [8] GGIP: Structure and sequence-based GPCR-GPCR interaction pair predictor
    Nemoto, Wataru
    Yamanishi, Yoshihiro
    Limviphuvadh, Vachiranee
    Saito, Akira
    Toh, Hiroyuki
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2016, 84 (09) : 1224 - 1233
  • [9] A Sequence-Based Hyper-Heuristic for Traveling Thieves
    Rodriguez, Daniel
    Cruz-Duarte, Jorge M.
    Carlos Ortiz-Bayliss, Jose
    Amaya, Ivan
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [10] BCrystal: an interpretable sequence-based protein crystallization predictor
    Elbasir, Abdurrahman
    Mall, Raghvendra
    Kunji, Khalid
    Rawi, Reda
    Islam, Zeyaul
    Chuang, Gwo-Yu
    Kolatkar, Prasanna R.
    Bensmail, Halima
    BIOINFORMATICS, 2020, 36 (05) : 1429 - 1438