Advances in High-throughput Protein Structural Bioinformatics

被引:0
|
作者
Zhu, Yun-Chi [1 ]
Lu, Zu-Hong [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Digital Med Engn, Nanjing 211189, Peoples R China
关键词
protein structural bioinformatics; high-throughput; AlphaFold-like system; structural proteomics; STRUCTURE ALIGNMENT; WEB SERVER; CRYO-EM; PREDICTION; DOCKING; EFFICIENT; ACCURACY; SEQUENCE; MODELS;
D O I
10.16476/j.pibb.2024.0082
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
This review provides a comprehensive summary of the latest advancements in high-throughput protein structural bioinformatics, a field that has undergone a revolutionary transformation with the advent of deep learning-based protein structure prediction systems like AlphaFold2. These systems have significantly increased the accuracy, speed, and scale of protein structure prediction, resulting in an exponential growth in the number of protein structures available for analysis. Notably, the AlphaFold Protein Structure Database (AFDB) has amassed over 214 million protein structures, surpassing the PDB's 50-year cumulative data by over 1 000-fold within several months. Big data is driving the comprehensive upgrade of protein structural bioinformatics. This review focuses on three main areas: structure data management, tool development, and structure data mining. In the realm of structure data management, the review spotlights the optimization strategy of AlphaFold-like systems, which significantly reduces the resource requirements for protein folding, enabling more researchers to make custom structure predictions and further enlarging the data scale. The resulting "data explosion" has exerted increased pressure on storage and bandwidth, prompting the development of cutting-edge tools such as Foldcomp, PDC, and ProteStAr for compressing PDB files. Moreover, the review underscores the critical role of public repositories like ModelArchive and PDB-Dev in archiving and sharing third-party AlphaFold models. It also highlights the utilization of independent services like MineProt and 3D-Beacons to create more interactive and accessible data portals. In terms of tool development, the review spotlights recent breakthroughs in structure alignment algorithms, represented by Foldseek, which enable ultra-fast searching of large protein structure databases. It also covers tools for functional annotation of proteins based on their structures, including AlphaFill for ligand annotation, DeepFRI for Gene Ontology (GO) annotation, TT3D for protein-protein interaction (PPI) prediction, among others. It is proposed that 3Di sequences born concurrently with Foldseek can enhance many sequence-based deep learning models developed in the pre-AlphaFold era, enabling them to be applied to structure-based function prediction. The challenges on traditional molecular docking methods in the high-throughput era are mentioned at last, in a gesture to arouse the attention of researchers. Finally, the review explores the burgeoning field of structure data mining. Whole proteome structuring has become feasible in recent years, and scientists are processing large structure datasets from an omics viewpoint, continuously identifying analyzable elements and optimizing methodologies, as well as utilizing newly developed tools to push the boundaries. Notable examples include the identification of new protein families, the development of protein structure clustering, and the integration of AlphaFold with conventional experimental techniques to solve large structures. These advancements are paving the way for a deeper understanding of protein structure and function and have the potential to unlock new discoveries in the life sciences. However, the review also acknowledges the challenges and limitations that persist in the field, including the lack of diversity in high-throughput software for protein structural bioinformatics and the existing bottleneck in rapidly predicting protein complex structures. Overall, structural bioinformatics is expected to play an even more crucial role in the life sciences with the development of high-throughput methodology.
引用
收藏
页码:1989 / 1999
页数:11
相关论文
共 79 条
  • [1] Ahdritz G., 2022, bioRxiv, DOI [10.1101/2022.11.20.517210, DOI 10.1101/2022.11.20.517210, 10.1101/2022.11.20.517210v2]
  • [2] Accurate prediction of protein structures and interactions using a three-track neural network
    Baek, Minkyung
    DiMaio, Frank
    Anishchenko, Ivan
    Dauparas, Justas
    Ovchinnikov, Sergey
    Lee, Gyu Rie
    Wang, Jue
    Cong, Qian
    Kinch, Lisa N.
    Schaeffer, R. Dustin
    Millan, Claudia
    Park, Hahnbeom
    Adams, Carson
    Glassman, Caleb R.
    DeGiovanni, Andy
    Pereira, Jose H.
    Rodrigues, Andria V.
    van Dijk, Alberdina A.
    Ebrecht, Ana C.
    Opperman, Diederik J.
    Sagmeister, Theo
    Buhlheller, Christoph
    Pavkov-Keller, Tea
    Rathinaswamy, Manoj K.
    Dalwadi, Udit
    Yip, Calvin K.
    Burke, John E.
    Garcia, K. Christopher
    Grishin, Nick V.
    Adams, Paul D.
    Read, Randy J.
    Baker, David
    [J]. SCIENCE, 2021, 373 (6557) : 871 - +
  • [3] Clustering predicted structures at the scale of the known protein universe
    Barrio-Hernandez, Inigo
    Yeo, Jingi
    Janes, Jurgen
    Mirdita, Milot
    Gilchrist, Cameron L. M.
    Wein, Tanita
    Varadi, Mihaly
    Velankar, Sameer
    Beltrao, Pedro
    Steinegger, Martin
    [J]. NATURE, 2023, 622 (7983) : 637 - +
  • [4] The structural context of posttranslational modifications at a proteome-wide scale
    Bludau, Isabell
    Willems, Sander
    Zeng, Wen-Feng
    Strauss, Maximilian T.
    Hansen, Fynn M.
    Tanzer, Maria C.
    Karayel, Ozge
    Schulman, Brenda A.
    Mann, Matthias
    [J]. PLOS BIOLOGY, 2022, 20 (05)
  • [5] Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins
    Bordin, Nicola
    Lau, Andy M.
    Orengo, Christine
    [J]. MOLECULAR CELL, 2023, 83 (22) : 3950 - 3952
  • [6] Bozitao Zhong, 2022, HPCAsia 2022 Workshop: International Conference on High Performance Computing in Asia-Pacific Region Workshops, P1, DOI 10.1145/3503470.3503471
  • [7] PROTEIN-FOLDING CONTEST SEEKS NEXT BIG BREAKTHROUGH
    Callaway, Ewen
    [J]. NATURE, 2023, 613 (7942) : 13 - 14
  • [8] Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data
    Choudhary, Preeti
    Anyango, Stephen
    Berrisford, John
    Tolchard, James
    Varadi, Mihaly
    Velankar, Sameer
    [J]. SCIENTIFIC DATA, 2023, 10 (01)
  • [9] Integrating AlphaFold and deep learning for atomistic interpretation of cryo-EM maps
    Dai, Xin
    Wu, Longlong
    Yoo, Shinjae
    Liu, Qun
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (06)
  • [10] SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins
    Dana, Jose M.
    Gutmanas, Aleksandras
    Tyagi, Nidhi
    Qi, Guoying
    O'Donovan, Claire
    Martin, Maria
    Velankar, Sameer
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D482 - D489