Advances in High-throughput Protein Structural Bioinformatics

被引：0

作者：

Zhu, Yun-Chi ^{[1
]}

Lu, Zu-Hong ^{[1
]}

机构：

[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Digital Med Engn, Nanjing 211189, Peoples R China

来源：

PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS | 2024年 / 51卷 / 09期

关键词：

protein structural bioinformatics; high-throughput; AlphaFold-like system; structural proteomics; STRUCTURE ALIGNMENT; WEB SERVER; CRYO-EM; PREDICTION; DOCKING; EFFICIENT; ACCURACY; SEQUENCE; MODELS;

D O I：

10.16476/j.pibb.2024.0082

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

This review provides a comprehensive summary of the latest advancements in high-throughput protein structural bioinformatics, a field that has undergone a revolutionary transformation with the advent of deep learning-based protein structure prediction systems like AlphaFold2. These systems have significantly increased the accuracy, speed, and scale of protein structure prediction, resulting in an exponential growth in the number of protein structures available for analysis. Notably, the AlphaFold Protein Structure Database (AFDB) has amassed over 214 million protein structures, surpassing the PDB's 50-year cumulative data by over 1 000-fold within several months. Big data is driving the comprehensive upgrade of protein structural bioinformatics. This review focuses on three main areas: structure data management, tool development, and structure data mining. In the realm of structure data management, the review spotlights the optimization strategy of AlphaFold-like systems, which significantly reduces the resource requirements for protein folding, enabling more researchers to make custom structure predictions and further enlarging the data scale. The resulting "data explosion" has exerted increased pressure on storage and bandwidth, prompting the development of cutting-edge tools such as Foldcomp, PDC, and ProteStAr for compressing PDB files. Moreover, the review underscores the critical role of public repositories like ModelArchive and PDB-Dev in archiving and sharing third-party AlphaFold models. It also highlights the utilization of independent services like MineProt and 3D-Beacons to create more interactive and accessible data portals. In terms of tool development, the review spotlights recent breakthroughs in structure alignment algorithms, represented by Foldseek, which enable ultra-fast searching of large protein structure databases. It also covers tools for functional annotation of proteins based on their structures, including AlphaFill for ligand annotation, DeepFRI for Gene Ontology (GO) annotation, TT3D for protein-protein interaction (PPI) prediction, among others. It is proposed that 3Di sequences born concurrently with Foldseek can enhance many sequence-based deep learning models developed in the pre-AlphaFold era, enabling them to be applied to structure-based function prediction. The challenges on traditional molecular docking methods in the high-throughput era are mentioned at last, in a gesture to arouse the attention of researchers. Finally, the review explores the burgeoning field of structure data mining. Whole proteome structuring has become feasible in recent years, and scientists are processing large structure datasets from an omics viewpoint, continuously identifying analyzable elements and optimizing methodologies, as well as utilizing newly developed tools to push the boundaries. Notable examples include the identification of new protein families, the development of protein structure clustering, and the integration of AlphaFold with conventional experimental techniques to solve large structures. These advancements are paving the way for a deeper understanding of protein structure and function and have the potential to unlock new discoveries in the life sciences. However, the review also acknowledges the challenges and limitations that persist in the field, including the lack of diversity in high-throughput software for protein structural bioinformatics and the existing bottleneck in rapidly predicting protein complex structures. Overall, structural bioinformatics is expected to play an even more crucial role in the life sciences with the development of high-throughput methodology.

引用

页码：1989 / 1999

页数：11

共 79 条

[1] Ahdritz G., 2022, bioRxiv, DOI [10.1101/2022.11.20.517210, DOI 10.1101/2022.11.20.517210, 10.1101/2022.11.20.517210v2]
[2] Accurate prediction of protein structures and interactions using a three-track neural network
Baek, Minkyung
DiMaio, Frank
Anishchenko, Ivan
Dauparas, Justas
Ovchinnikov, Sergey
Lee, Gyu Rie
Wang, Jue
Cong, Qian
Kinch, Lisa N.
Schaeffer, R. Dustin
Millan, Claudia
Park, Hahnbeom
Adams, Carson
Glassman, Caleb R.
DeGiovanni, Andy
Pereira, Jose H.
Rodrigues, Andria V.
van Dijk, Alberdina A.
Ebrecht, Ana C.
Opperman, Diederik J.
Sagmeister, Theo
Buhlheller, Christoph
Pavkov-Keller, Tea
Rathinaswamy, Manoj K.
Dalwadi, Udit
Yip, Calvin K.
Burke, John E.
Garcia, K. Christopher
Grishin, Nick V.
Adams, Paul D.
Read, Randy J.
Baker, David
[J]. SCIENCE, 2021, 373 (6557) : 871 - +
[3] Clustering predicted structures at the scale of the known protein universe
Barrio-Hernandez, Inigo
Yeo, Jingi
Janes, Jurgen
Mirdita, Milot
Gilchrist, Cameron L. M.
Wein, Tanita
Varadi, Mihaly
Velankar, Sameer
Beltrao, Pedro
Steinegger, Martin
[J]. NATURE, 2023, 622 (7983) : 637 - +
[4] The structural context of posttranslational modifications at a proteome-wide scale
Bludau, Isabell
Willems, Sander
Zeng, Wen-Feng
Strauss, Maximilian T.
Hansen, Fynn M.
Tanzer, Maria C.
Karayel, Ozge
Schulman, Brenda A.
Mann, Matthias
[J]. PLOS BIOLOGY, 2022, 20 (05)
[5] Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins
Bordin, Nicola
Lau, Andy M.
Orengo, Christine
[J]. MOLECULAR CELL, 2023, 83 (22) : 3950 - 3952
[6] Bozitao Zhong, 2022, HPCAsia 2022 Workshop: International Conference on High Performance Computing in Asia-Pacific Region Workshops, P1, DOI 10.1145/3503470.3503471
[7] PROTEIN-FOLDING CONTEST SEEKS NEXT BIG BREAKTHROUGH
Callaway, Ewen
[J]. NATURE, 2023, 613 (7942) : 13 - 14
[8] Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data
Choudhary, Preeti
Anyango, Stephen
Berrisford, John
Tolchard, James
Varadi, Mihaly
Velankar, Sameer
[J]. SCIENTIFIC DATA, 2023, 10 (01)
[9] Integrating AlphaFold and deep learning for atomistic interpretation of cryo-EM maps
Dai, Xin
Wu, Longlong
Yoo, Shinjae
Liu, Qun
[J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (06)
[10] SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins
Dana, Jose M.
Gutmanas, Aleksandras
Tyagi, Nidhi
Qi, Guoying
O'Donovan, Claire
Martin, Maria
Velankar, Sameer
[J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D482 - D489

← 1 2 3 4 5 6 7 8 →