ProtPlat: an efficient pre-training platform for protein classification based on FastText

被引：4

作者：

Jin, Yuan

Yang, Yang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

来源：

BMC BIOINFORMATICS | 2022年 / 23卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Protein sequence classification; ProtPlat; Pre-training; Web server; SUBCELLULAR-LOCALIZATION; PREDICTION;

D O I：

10.1186/s12859-022-04604-2

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: For the past decades, benefitting from the rapid growth of protein sequence data in public databases, a lot of machine learning methods have been developed to predict physicochemical properties or functions of proteins using amino acid sequence features. However, the prediction performance often suffers from the lack of labeled data. In recent years, pre-training methods have been widely studied to address the small-sample issue in computer vision and natural language processing fields, while specific pre-training techniques for protein sequences are few. Results: In this paper, we propose a pre-training platform for representing protein sequences, called ProtPlat, which uses the Pfam database to train a three-layer neural network, and then uses specific training data from downstream tasks to fine-tune the model. ProtPlat can learn good representations for amino acids, and at the same time achieve efficient classification. We conduct experiments on three protein classification tasks, including the identification of type III secreted effectors, the prediction of subcellular localization, and the recognition of signal peptides. The experimental results show that the pre-training can enhance model performance effectively and ProtPlat is competitive to the state-of-the-art predictors, especially for small datasets. We implement the ProtPlat platform as a web service (https://compbio.sjtu.edu.cn/protplat) that is accessible to the public. Conclusions: To enhance the feature representation of protein amino acid sequences and improve the performance of sequence-based classification tasks, we develop ProtPlat, a general platform for the pre-training of protein sequences, which is featured by a large-scale supervised training based on Pfam database and an efficient learning model, FastText. The experimental results of three downstream classification tasks demonstrate the efficacy of ProtPlat.

引用

页数：17

共 50 条

[21] MULTIMODAL PRE-TRAINING MODEL FOR SEQUENCE-BASED PREDICTION OF PROTEIN-PROTEIN INTERACTION
Xue, Yang
Liu, Zijing
Fang, Xiaomin
Wang, Fan
MACHINE LEARNING IN COMPUTATIONAL BIOLOGY, VOL 165, 2021, 165 : 34 - 46
[22] On Efficient Transformer-Based Image Pre-training for Low-Level Vision
Li, Wenbo
Lu, Xin
Qian, Shengju
Lu, Jiangbo
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1089 - 1097
[23] Efficient learning for spoken language understanding tasks with word embedding based pre-training
Luan, Yi
Watanabe, Shinji
Harsham, Bret
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
[24] Parameter-Efficient Log Anomaly Detection based on Pre-training model and LORA
He, Shiming
Lei, Ying
Zhang, Ying
Xie, Kun
Sharma, Pradip Kumar
2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 207 - 217
[25] Efficient Pre-training for Localized Instruction Generation of Procedural Videos
Batra, Anil
Moltisanti, Davide
Sevilla-Lara, Laura
Rohrbach, Marcus
Keller, Frank
COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 347 - 363
[26] CyclicFL: Efficient Federated Learning with Cyclic Model Pre-Training
Zhang, Pengyu
Zhou, Yingbo
Hu, Ming
Wei, Xian
Chen, Mingsong
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2025,
[27] Efficient Image Pre-training with Siamese Cropped Masked Autoencoders
Eymael, Alexandre
Vandeghen, Renaud
Cioppa, Anthony
Giancola, Silvio
Ghanem, Bernard
Van Droogenbroeck, Marc
COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 348 - 366
[28] Does the Fairness of Your Pre-Training Hold Up? Examining the Influence of Pre-Training Techniques on Skin Tone Bias in Skin Lesion Classification
Seth, Pratinav
Pai, Abhilash K.
2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 580 - 587
[29] Benchmarking the influence of pre-training on explanation performance in MR image classification
Oliveira, Marta
Wilming, Rick
Clark, Benedict
Budding, Celine
Eitel, Fabian
Ritter, Kerstin
Haufe, Stefan
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
[30] Progress in protein pre-training models integrating structural Knowledge
Tang, Tian-Yi
Xiong, Yi-Ming
Zhang, Rui-Ge
Zhang, Jian
Li, Wen-Fei
Wang, Jun
Wang, Wei
ACTA PHYSICA SINICA, 2024, 73 (18)

← 1 2 3 4 5 →