Learned protein embeddings for machine learning

被引:166
|
作者
Yang, Kevin K. [1 ]
Wu, Zachary [1 ]
Bedbrook, Claire N. [2 ]
Arnold, Frances H. [1 ,2 ]
机构
[1] CALTECH, Div Chem & Chem Engn, Pasadena, CA 91125 USA
[2] CALTECH, Div Biol & Biol Engn, Pasadena, CA 91125 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
RECOMBINATION;
D O I
10.1093/bioinformatics/bty178
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Machine-learning models trained on protein sequences and their measured functions can infer biological properties of unseen sequences without requiring an understanding of the underlying physical or biological mechanisms. Such models enable the prediction and discovery of sequences with optimal properties. Machine-learning models generally require that their inputs be vectors, and the conversion from a protein sequence to a vector representation affects the model's ability to learn. We propose to learn embedded representations of protein sequences that take advantage of the vast quantity of unmeasured protein sequence data available. These embeddings are low-dimensional and can greatly simplify downstream modeling. Results: The predictive power of Gaussian process models trained using embeddings is comparable to those trained on existing representations, which suggests that embeddings enable accurate predictions despite having orders of magnitude fewer dimensions. Moreover, embeddings are simpler to obtain because they do not require alignments, structural data, or selection of informative amino-acid properties. Visualizing the embedding vectors shows meaningful relationships between the embedded proteins are captured. Availability and implementation: The embedding vectors and code to reproduce the results are available at https://github.com/fhalab/embeddings_reproduction/. Contact: frances@cheme.caltech.edu Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:2642 / 2648
页数:7
相关论文
共 50 条
  • [1] Learned protein embeddings for machine learning (vol 34, pg 2642, 2018)
    Yang, Kevin K.
    Wu, Zachary
    Bedbrook, Claire N.
    Arnold, Frances H.
    [J]. BIOINFORMATICS, 2018, 34 (23) : 4138 - 4138
  • [2] Towards Access Control for Machine Learning Embeddings
    Matzutt, Roman
    [J]. PROCEEDINGS OF THE 2024 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2024, 2024, : 219 - 220
  • [3] Learning Navigation Skills for Legged Robots with Learned Robot Embeddings
    Truong, Joanne
    Yarats, Denis
    Li, Tianyu
    Meier, Franziska
    Chernova, Sonia
    Batra, Dhruv
    Rai, Akshara
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 484 - 491
  • [4] Marius: Learning Massive Graph Embeddings on a Single Machine
    Mohoney, Jason
    Waleffe, Roger
    Xu, Henry
    Rekatsinas, Theodoros
    Venkataraman, Shivaram
    [J]. PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), 2021, : 533 - 549
  • [5] Trainable Discrete Feature Embeddings for Quantum Machine Learning
    Thumwanit, Napat
    Lortaraprasert, Chayaphol
    Yano, Hiroshi
    Raymond, Rudy
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021, 2021, : 479 - 480
  • [6] Lessons Learned on Machine Learning for Computer Security
    Arp, Daniel
    Quiring, Erwin
    Pendlebury, Feargus
    Warnecke, Alexander
    Pierazzi, Fabio
    Wressnegger, Christian
    Cavallaro, Lorenzo
    Rieck, Konrad
    [J]. IEEE SECURITY & PRIVACY, 2023, 21 (05) : 72 - 77
  • [7] Learning Concept Embeddings with Combined Human-Machine Expertise
    Wilber, Michael J.
    Kwak, Iljung S.
    Kriegman, David
    Belongie, Serge
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 981 - 989
  • [8] Knowledge graph embeddings for dealing with concept drift in machine learning
    Chen, Jiaoyan
    Lecue, Freddy
    Pan, Jeff Z.
    Deng, Shumin
    Chen, Huajun
    [J]. JOURNAL OF WEB SEMANTICS, 2021, 67
  • [9] Survey on graph embeddings and their applications to machine learning problems on graphs
    Makarov, Ilya
    Kiselev, Dmitrii
    Nikitinsky, Nikita
    Subelj, Lovro
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 62
  • [10] Invited: Trainable Discrete Feature Embeddings for Quantum Machine Learning
    Thumwanit, Napat
    Lortaraprasert, Chayaphol
    Raymond, Rudy
    [J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 1352 - 1355