ProtPlat: an efficient pre-training platform for protein classification based on FastText

被引:4
|
作者
Jin, Yuan
Yang, Yang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein sequence classification; ProtPlat; Pre-training; Web server; SUBCELLULAR-LOCALIZATION; PREDICTION;
D O I
10.1186/s12859-022-04604-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: For the past decades, benefitting from the rapid growth of protein sequence data in public databases, a lot of machine learning methods have been developed to predict physicochemical properties or functions of proteins using amino acid sequence features. However, the prediction performance often suffers from the lack of labeled data. In recent years, pre-training methods have been widely studied to address the small-sample issue in computer vision and natural language processing fields, while specific pre-training techniques for protein sequences are few. Results: In this paper, we propose a pre-training platform for representing protein sequences, called ProtPlat, which uses the Pfam database to train a three-layer neural network, and then uses specific training data from downstream tasks to fine-tune the model. ProtPlat can learn good representations for amino acids, and at the same time achieve efficient classification. We conduct experiments on three protein classification tasks, including the identification of type III secreted effectors, the prediction of subcellular localization, and the recognition of signal peptides. The experimental results show that the pre-training can enhance model performance effectively and ProtPlat is competitive to the state-of-the-art predictors, especially for small datasets. We implement the ProtPlat platform as a web service (https://compbio.sjtu.edu.cn/protplat) that is accessible to the public. Conclusions: To enhance the feature representation of protein amino acid sequences and improve the performance of sequence-based classification tasks, we develop ProtPlat, a general platform for the pre-training of protein sequences, which is featured by a large-scale supervised training based on Pfam database and an efficient learning model, FastText. The experimental results of three downstream classification tasks demonstrate the efficacy of ProtPlat.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] MULTIMODAL PRE-TRAINING MODEL FOR SEQUENCE-BASED PREDICTION OF PROTEIN-PROTEIN INTERACTION
    Xue, Yang
    Liu, Zijing
    Fang, Xiaomin
    Wang, Fan
    MACHINE LEARNING IN COMPUTATIONAL BIOLOGY, VOL 165, 2021, 165 : 34 - 46
  • [22] On Efficient Transformer-Based Image Pre-training for Low-Level Vision
    Li, Wenbo
    Lu, Xin
    Qian, Shengju
    Lu, Jiangbo
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1089 - 1097
  • [23] Efficient learning for spoken language understanding tasks with word embedding based pre-training
    Luan, Yi
    Watanabe, Shinji
    Harsham, Bret
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
  • [24] Parameter-Efficient Log Anomaly Detection based on Pre-training model and LORA
    He, Shiming
    Lei, Ying
    Zhang, Ying
    Xie, Kun
    Sharma, Pradip Kumar
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 207 - 217
  • [25] Efficient Pre-training for Localized Instruction Generation of Procedural Videos
    Batra, Anil
    Moltisanti, Davide
    Sevilla-Lara, Laura
    Rohrbach, Marcus
    Keller, Frank
    COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 347 - 363
  • [26] CyclicFL: Efficient Federated Learning with Cyclic Model Pre-Training
    Zhang, Pengyu
    Zhou, Yingbo
    Hu, Ming
    Wei, Xian
    Chen, Mingsong
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2025,
  • [27] Efficient Image Pre-training with Siamese Cropped Masked Autoencoders
    Eymael, Alexandre
    Vandeghen, Renaud
    Cioppa, Anthony
    Giancola, Silvio
    Ghanem, Bernard
    Van Droogenbroeck, Marc
    COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 348 - 366
  • [28] Does the Fairness of Your Pre-Training Hold Up? Examining the Influence of Pre-Training Techniques on Skin Tone Bias in Skin Lesion Classification
    Seth, Pratinav
    Pai, Abhilash K.
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 580 - 587
  • [29] Benchmarking the influence of pre-training on explanation performance in MR image classification
    Oliveira, Marta
    Wilming, Rick
    Clark, Benedict
    Budding, Celine
    Eitel, Fabian
    Ritter, Kerstin
    Haufe, Stefan
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [30] Progress in protein pre-training models integrating structural Knowledge
    Tang, Tian-Yi
    Xiong, Yi-Ming
    Zhang, Rui-Ge
    Zhang, Jian
    Li, Wen-Fei
    Wang, Jun
    Wang, Wei
    ACTA PHYSICA SINICA, 2024, 73 (18)