ProLanGO2: Protein Function Prediction with Ensemble of Encoder-Decoder Networks

被引:2
|
作者
Hippe, Kyle [1 ]
Gbenro, Sola [1 ]
Cao, Renzhi [1 ]
机构
[1] Pacific Lutheran Univ, Dept Comp Sci, Tacoma, WA 98447 USA
关键词
Protein function prediction; Recurrent Neural Network; Machine learning; AUTOMATED PREDICTION; ANNOTATIONS; SEQUENCES; DATABASE;
D O I
10.1145/3388440.3414701
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Predicting protein function from protein sequence is a main challenge in the computational biology field. Traditional methods that search protein sequences against existing databases may not work well in practice, particularly when little or no homology exists in the database. We introduce the ProLanGO2 method which utilizes the natural language processing and machine learning techniques to tackle the protein function prediction problem with protein sequence as input. Our method has been benchmarked blindly in the latest Critical Assessment of protein Function Annotation algorithms (CAFA 4) experiment. There are a few changes compared to the old version of ProLanGO. First of all, the latest version of the UniProt database is used. Second, the Uniprot database is filtered by the newly created fragment sequence database FSD to prepare for the protein sequence language. Third, the Encoder-Decoder network, a model consisting of two RNNs (encoder and decoder), is used to train models on the dataset. Fourth, if no k-mers of a protein sequence exist in the FSD, we select the top ten GO terms with the highest probability in all sequences from the Uniprot database that didn't contain any k-mers in FSD, and use those ten GO terms as back up for the prediction of new protein sequence. Finally, we selected the 100 best performing models and explored all combinations of those models to select the best performance ensemble model. We benchmark those different combinations of models on CAFA 3 dataset and select three top performance ensemble models for prediction in the latest CAFA 4 experiment as CaoLab. We have also evaluated the performance of our ProLanGO2 method on 253 unseen sequences taken from the UniProt database and compared with several other protein function prediction methods, the results show that our method achieves great performance among sequence-based protein function prediction methods. Our method is available in GitHub: https://github.com/caorenzhi/ProLanGO2.git.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Multimodal Encoder-Decoder Attention Networks for Visual Question Answering
    Chen, Chongqing
    Han, Dezhi
    Wang, Jun
    IEEE ACCESS, 2020, 8 : 35662 - 35671
  • [42] Encoder-decoder with densely convolutional networks for monocular depth estimation
    Chen, Songnan
    Tang, Mengxia
    Kan, Jiangming
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2019, 36 (10) : 1709 - 1718
  • [43] EEG Channel Interpolation Using Deep Encoder-decoder Networks
    Saba-Sadiya, Sari
    Alhanai, Tuka
    Liu, Taosheng
    Ghassemi, Mohammad M.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2432 - 2439
  • [44] Attention-based encoder-decoder networks for workflow recognition
    Min Zhang
    Haiyang Hu
    Zhongjin Li
    Jie Chen
    Multimedia Tools and Applications, 2021, 80 : 34973 - 34995
  • [45] Fetal electrocardiography extraction with residual convolutional encoder-decoder networks
    Zhong, Wei
    Liao, Lijuan
    Guo, Xuemei
    Wang, Guoli
    AUSTRALASIAN PHYSICAL & ENGINEERING SCIENCES IN MEDICINE, 2019, 42 (04) : 1081 - 1089
  • [46] PottsMGNet: A Mathematical Explanation of Encoder-Decoder Based Neural Networks
    Tai, Xue-Cheng
    Liu, Hao
    Chan, Raymond
    SIAM JOURNAL ON IMAGING SCIENCES, 2024, 17 (01): : 540 - 594
  • [47] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174
  • [48] Electricity Price Prediction Using Encoder-Decoder Recurrent Neural Networks in Turkish Dayahead Market
    Gunduz, Salih
    Ugurlu, Umut
    Oksuz, Ilkay
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [49] Deep encoder-decoder networks for belt longitudinal tear detection
    You, Lei
    Luo, Minghua
    Zhu, Xinglin
    Zhou, Bin
    MEASUREMENT & CONTROL, 2024,
  • [50] Video to Text Study using an Encoder-Decoder Networks Approach
    Ismael Orozco, Carlos
    Elena Buemi, Maria
    Jacobo Berlles, Julio
    2018 37TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2018,