Improving protein structure prediction with extended sequence similarity searches and deep-learning-based refinement in CASP15

被引:2
|
作者
Oda, Toshiyuki [1 ]
机构
[1] PEZY Comp KK, Tokyo, Japan
关键词
AlphaFold; CASP; deep learning; metagenomes; protein structure prediction; sequence similarity search; GENERATION; RESOURCE;
D O I
10.1002/prot.26551
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The human predictor team PEZYFoldings got first place with the assessor's formulae (3rd place with Global Distance Test Total Score [GDT-TS]) in the single-domain category and 10th place in the multimer category in Critical Assessment of Structure Prediction 15. In this paper, I describe the exact method used by PEZYFoldings in the competition. As AlphaFold2 and AlphaFold-Multimer, developed by DeepMind, were state-of-the-art structure prediction tools, it was assumed that enhancing the input and output of the tools was an effective strategy to obtain the highest accuracy for structure prediction. Therefore, I used additional tools and databases to collect evolutionarily related sequences and introduced a deep-learning-based model in the refinement step. In addition to these modifications, manual interventions were performed to address various tasks. Detailed analyses were performed after the competition to identify the main contributors to performance. Comparing the number of evolutionarily related sequences I used with those of the other teams that provided AlphaFold2's baseline predictions revealed that an extensive sequence similarity search was one of the main contributors. Nonetheless, there were specific targets for which I could not identify any evolutionarily related sequences, resulting in my inability to construct accurate structures for these targets. Notably, I noticed that I had gained large Z-scores with the subunits of H1137, for which I performed manual domain parsing considering the interfaces between the subunits. This finding implies that the manual intervention contributed to my performance. The influence of the refinement model on the accuracy of structure prediction was minimal. I could have predicted structures with a similar level of accuracy without employing the refinement model. However, from the perspective of accuracy self-estimate, many structures demonstrated improvement after refinement. This improvement likely had a substantial influence on improving my position in the assessor's formulae rankings. These results highlight the opportunities for improvement in (1) multimer prediction, (2) building of larger and more diverse databases, and (3) developing tools to predict structures from primary sequences alone. In addition, transferring the manual intervention process to automation is a future concern.
引用
收藏
页码:1712 / 1723
页数:12
相关论文
共 50 条
  • [1] Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15
    Jian Liu
    Zhiye Guo
    Tianqi Wu
    Raj S. Roy
    Chen Chen
    Jianlin Cheng
    [J]. Communications Chemistry, 6
  • [2] Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15
    Liu, Jian
    Guo, Zhiye
    Wu, Tianqi
    Roy, Raj S.
    Chen, Chen
    Cheng, Jianlin
    [J]. COMMUNICATIONS CHEMISTRY, 2023, 6 (01)
  • [3] Progress at protein structure prediction, as seen in CASP15
    Elofsson, Arne
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2023, 80
  • [4] Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15
    Roy, Raj S.
    Liu, Jian
    Giri, Nabin
    Guo, Zhiye
    Cheng, Jianlin
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2023, 91 (12) : 1889 - 1902
  • [5] Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15
    Jian Liu
    Zhiye Guo
    Tianqi Wu
    Raj S. Roy
    Farhan Quadir
    Chen Chen
    Jianlin Cheng
    [J]. Communications Biology, 6
  • [6] Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15
    Liu, Jian
    Guo, Zhiye
    Wu, Tianqi
    Roy, Raj S.
    Quadir, Farhan
    Chen, Chen
    Cheng, Jianlin
    [J]. COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [7] Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15
    Liu, Jun
    Liu, Dong
    He, Guangxing
    Zhang, Guijun
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2023, 91 (12) : 1861 - 1870
  • [8] Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14
    Liu, Jian
    Wu, Tianqi
    Guo, Zhiye
    Hou, Jie
    Cheng, Jianlin
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2022, 90 (01) : 58 - 72
  • [9] Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14
    Anishchenko, Ivan
    Baek, Minkyung
    Park, Hahnbeom
    Hiranuma, Naozumi
    Kim, David E.
    Dauparas, Justas
    Mansoor, Sanaa
    Humphreys, Ian R.
    Baker, David
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2021, 89 (12) : 1722 - 1733
  • [10] Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15
    Zheng, Wei
    Wuyun, Qiqige
    Freddolino, Peter L.
    Zhang, Yang
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2023, 91 (12) : 1684 - 1703