ProLanGO2: Protein Function Prediction with Ensemble of Encoder-Decoder Networks

被引:2
|
作者
Hippe, Kyle [1 ]
Gbenro, Sola [1 ]
Cao, Renzhi [1 ]
机构
[1] Pacific Lutheran Univ, Dept Comp Sci, Tacoma, WA 98447 USA
关键词
Protein function prediction; Recurrent Neural Network; Machine learning; AUTOMATED PREDICTION; ANNOTATIONS; SEQUENCES; DATABASE;
D O I
10.1145/3388440.3414701
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Predicting protein function from protein sequence is a main challenge in the computational biology field. Traditional methods that search protein sequences against existing databases may not work well in practice, particularly when little or no homology exists in the database. We introduce the ProLanGO2 method which utilizes the natural language processing and machine learning techniques to tackle the protein function prediction problem with protein sequence as input. Our method has been benchmarked blindly in the latest Critical Assessment of protein Function Annotation algorithms (CAFA 4) experiment. There are a few changes compared to the old version of ProLanGO. First of all, the latest version of the UniProt database is used. Second, the Uniprot database is filtered by the newly created fragment sequence database FSD to prepare for the protein sequence language. Third, the Encoder-Decoder network, a model consisting of two RNNs (encoder and decoder), is used to train models on the dataset. Fourth, if no k-mers of a protein sequence exist in the FSD, we select the top ten GO terms with the highest probability in all sequences from the Uniprot database that didn't contain any k-mers in FSD, and use those ten GO terms as back up for the prediction of new protein sequence. Finally, we selected the 100 best performing models and explored all combinations of those models to select the best performance ensemble model. We benchmark those different combinations of models on CAFA 3 dataset and select three top performance ensemble models for prediction in the latest CAFA 4 experiment as CaoLab. We have also evaluated the performance of our ProLanGO2 method on 253 unseen sequences taken from the UniProt database and compared with several other protein function prediction methods, the results show that our method achieves great performance among sequence-based protein function prediction methods. Our method is available in GitHub: https://github.com/caorenzhi/ProLanGO2.git.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Interpretable Transformations with Encoder-Decoder Networks
    Worrall, Daniel E.
    Garbin, Stephan J.
    Turmukhambetov, Daniyar
    Brostow, Gabriel J.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5737 - 5746
  • [2] Data Prediction Based Encoder-Decoder Learning in Wireless Sensor Networks
    Njoya, Arouna Ndam
    Tchangmena, Allassan A. Nken
    Ari, Ado Adamou Abba
    Gueroui, Abdelhak
    Thron, Christopher
    Mpinda, Berthine Nyunga
    Thiare, Ousmane
    Tonye, Emmanuel
    [J]. IEEE ACCESS, 2022, 10 : 109340 - 109356
  • [3] Software Reliability Prediction through Encoder-Decoder Recurrent Neural Networks
    Li, Chen
    Zheng, Junjun
    Okamura, Hiroyuki
    Dohi, Tadashi
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2022, 7 (03) : 325 - 340
  • [4] Recurrent Encoder-Decoder Networks for Vessel Trajectory Prediction With Uncertainty Estimation
    Capobianco, Samuele
    Forti, Nicola
    Millefiori, Leonardo Maria
    Braca, Paolo
    Willett, Peter
    [J]. IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (03) : 2554 - 2565
  • [5] Ensemble Encoder-Decoder Models for Predicting Land Transformation
    Pourmohammadi, Pariya
    Strager, Michael P.
    Adjeroh, Donald A.
    [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14 : 11429 - 11438
  • [6] Recurrent Encoder-Decoder Networks for Time-Varying Dense Prediction
    Zeng, Tao
    Wu, Bian
    Zhou, Jiayu
    Davidson, Ian
    Ji, Shuiwang
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 1165 - 1170
  • [7] Convolutional Encoder-Decoder Networks for Robust Image-to-Motion Prediction
    Ridge, Barry
    Pahic, Rok
    Ude, Ales
    Morimoto, Jun
    [J]. ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, 2020, 980 : 514 - 523
  • [8] Ensemble Encoder-Decoder Models for Predicting Land Transformation
    Pourmohammadi, Pariya
    Strager, Michael P.
    Adjeroh, Donald A.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 11429 - 11438
  • [9] An encoder-decoder switch network for purchase prediction
    Park, Chanyoung
    Kim, Donghyun
    Yu, Hwanjo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 185
  • [10] DOM Refinement with neural Encoder-Decoder Networks
    Metzger, Nando
    [J]. PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2020, 88 (3-4): : 362 - 363