HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines

被引:2
|
作者
Gao, Mingjie [1 ]
Guenther, Stefan [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Inst Pharmaceut Sci, Hermann Herder Str 9, D-79104 Freiburg, Germany
关键词
machine learning; structure and sequence based; druggable cysteine; reactivity prediction; WEB SERVER; PROTEIN; GENERATION;
D O I
10.3390/ijms24065960
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein-ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines' classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A Case of Hyper-reactive Malarial Splenomegaly. The Role of Rapid Antigen-detecting and PCR-based Tests
    B. Mothe
    J. Lopez-Contreras
    O. H. Torres
    C. Muñoz
    P. Domingo
    M. Gurgui
    Infection, 2008, 36 : 167 - 169
  • [32] Comparative mapping of sequence-based and structure-based protein domains
    Ya Zhang
    John-Marc Chandonia
    Chris Ding
    Stephen R Holbrook
    BMC Bioinformatics, 6
  • [33] Comparative mapping of sequence-based and structure-based protein domains
    Zhang, Y
    Chandonia, JM
    Ding, C
    Holbrook, SR
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [34] On the use of structure and sequence-based features for protein classification and retrieval
    Keith Marsolo
    Srinivasan Parthasarathy
    Knowledge and Information Systems, 2008, 14 : 59 - 80
  • [35] On the use of structure and sequence-based features for protein classification and retrieval
    Marsolo, Keith
    Parthasarathy, Srinivasan
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 394 - +
  • [36] Sequence-Based Methods for Real Value Predictions of Protein Structure
    Kurgan, Lukasz
    Cios, Krzysztof
    Zhang, Hua
    Zhang, Tuo
    Chen, Ke
    Shen, Shiyi
    Ruan, Jishou
    CURRENT BIOINFORMATICS, 2008, 3 (03) : 183 - 196
  • [37] Secondary Structure, a Missing Component of Sequence-Based Minimotif Definitions
    Sargeant, David P.
    Gryk, Michael R.
    Maciejewski, Mark W.
    Thapar, Vishal
    Kundeti, Vamsi
    Rajasekaran, Sanguthevar
    Romero, Pedro
    Dunker, Keith
    Li, Shun-Cheng
    Kaneko, Tomonori
    Schiller, Martin R.
    PLOS ONE, 2012, 7 (12):
  • [38] POSEIDON: Peptidic Objects SEquence-based Interaction with cellular DOmaiNs: a new database and predictor
    Preto, Antonio J.
    Caniceiro, Ana B.
    Duarte, Francisco
    Fernandes, Hugo
    Ferreira, Lino
    Mourao, Joana
    Moreira, Irina S.
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [39] IDPpred: a new sequence-based predictor for identification of intrinsically disordered protein with enhanced accuracy
    Chaurasiya, Deepak
    Mondal, Rajkrishna
    Lahiri, Tapobrata
    Tripathi, Asmita
    Ghinmine, Tejas
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2025, 43 (02): : 957 - 965
  • [40] DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks
    Madani, Mohammad
    Lin, Kaixiang
    Tarakanova, Anna
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (24)