Learned protein embeddings for machine learning

被引：166

作者：

Yang, Kevin K. ^{[1
]}

Wu, Zachary ^{[1
]}

Bedbrook, Claire N. ^{[2
]}

Arnold, Frances H. ^{[1
,2
]}

机构：

[1] CALTECH, Div Chem & Chem Engn, Pasadena, CA 91125 USA

[2] CALTECH, Div Biol & Biol Engn, Pasadena, CA 91125 USA

来源：

BIOINFORMATICS | 2018年 / 34卷 / 15期

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

RECOMBINATION;

D O I：

10.1093/bioinformatics/bty178

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Machine-learning models trained on protein sequences and their measured functions can infer biological properties of unseen sequences without requiring an understanding of the underlying physical or biological mechanisms. Such models enable the prediction and discovery of sequences with optimal properties. Machine-learning models generally require that their inputs be vectors, and the conversion from a protein sequence to a vector representation affects the model's ability to learn. We propose to learn embedded representations of protein sequences that take advantage of the vast quantity of unmeasured protein sequence data available. These embeddings are low-dimensional and can greatly simplify downstream modeling. Results: The predictive power of Gaussian process models trained using embeddings is comparable to those trained on existing representations, which suggests that embeddings enable accurate predictions despite having orders of magnitude fewer dimensions. Moreover, embeddings are simpler to obtain because they do not require alignments, structural data, or selection of informative amino-acid properties. Visualizing the embedding vectors shows meaningful relationships between the embedded proteins are captured. Availability and implementation: The embedding vectors and code to reproduce the results are available at https://github.com/fhalab/embeddings_reproduction/. Contact: frances@cheme.caltech.edu Supplementary information: Supplementary data are available at Bioinformatics online.

引用

页码：2642 / 2648

页数：7

共 50 条

[1] Learned protein embeddings for machine learning (vol 34, pg 2642, 2018)
Yang, Kevin K.
Wu, Zachary
Bedbrook, Claire N.
Arnold, Frances H.
[J]. BIOINFORMATICS, 2018, 34 (23) : 4138 - 4138
[2] Towards Access Control for Machine Learning Embeddings
Matzutt, Roman
[J]. PROCEEDINGS OF THE 2024 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2024, 2024, : 219 - 220
[3] Learning Navigation Skills for Legged Robots with Learned Robot Embeddings
Truong, Joanne
Yarats, Denis
Li, Tianyu
Meier, Franziska
Chernova, Sonia
Batra, Dhruv
Rai, Akshara
[J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 484 - 491
[4] Marius: Learning Massive Graph Embeddings on a Single Machine
Mohoney, Jason
Waleffe, Roger
Xu, Henry
Rekatsinas, Theodoros
Venkataraman, Shivaram
[J]. PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), 2021, : 533 - 549
[5] Trainable Discrete Feature Embeddings for Quantum Machine Learning
Thumwanit, Napat
Lortaraprasert, Chayaphol
Yano, Hiroshi
Raymond, Rudy
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021, 2021, : 479 - 480
[6] Lessons Learned on Machine Learning for Computer Security
Arp, Daniel
Quiring, Erwin
Pendlebury, Feargus
Warnecke, Alexander
Pierazzi, Fabio
Wressnegger, Christian
Cavallaro, Lorenzo
Rieck, Konrad
[J]. IEEE SECURITY & PRIVACY, 2023, 21 (05) : 72 - 77
[7] Learning Concept Embeddings with Combined Human-Machine Expertise
Wilber, Michael J.
Kwak, Iljung S.
Kriegman, David
Belongie, Serge
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 981 - 989
[8] Knowledge graph embeddings for dealing with concept drift in machine learning
Chen, Jiaoyan
Lecue, Freddy
Pan, Jeff Z.
Deng, Shumin
Chen, Huajun
[J]. JOURNAL OF WEB SEMANTICS, 2021, 67
[9] Survey on graph embeddings and their applications to machine learning problems on graphs
Makarov, Ilya
Kiselev, Dmitrii
Nikitinsky, Nikita
Subelj, Lovro
[J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 62
[10] Invited: Trainable Discrete Feature Embeddings for Quantum Machine Learning
Thumwanit, Napat
Lortaraprasert, Chayaphol
Raymond, Rudy
[J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 1352 - 1355

← 1 2 3 4 5 →