Building on the transformer architecture and its revolutionizing of language models for natural language processing, protein language models (PLMs) are now emerging as a powerful tool for learning over large numbers of sequences in protein sequence databases and linking protein sequence to function. PLMs are shown to learn useful, task-agnostic sequence representations that allow predicting protein secondary structure, protein subcellular localization, and evolutionary relationships within protein families. However, existing models are strictly trained over protein sequences and miss an opportunity to leverage and integrate the information present in heterogeneous data sources. In this paper, inspired by the intrinsic role of three-dimensional/tertiary protein structure in determining a broad range of protein properties, we propose a PLM that integrates and attends to both protein sequence and tertiary structure. In particular, this paper posits that learning joint sequence-structure representations yields better representations for function-related prediction tasks. A detailed experimental evaluation shows that such joint sequence-structure representations are more powerful than sequence-based representations, yield better performance on superfamily membership across various metrics, and capture interesting relationships in the PLM-learned embedding space.
机构:
Mila Quebec AI Inst, Montreal, PQ, Canada
Univ Montreal, Montreal, PQ, CanadaMila Quebec AI Inst, Montreal, PQ, Canada
Zhang, Zuobai
Xu, Minghao
论文数: 0引用数: 0
h-index: 0
机构:
Mila Quebec AI Inst, Montreal, PQ, Canada
Univ Montreal, Montreal, PQ, CanadaMila Quebec AI Inst, Montreal, PQ, Canada
Xu, Minghao
Lozano, Aurelie
论文数: 0引用数: 0
h-index: 0
机构:
IBM Res, Yorktown Hts, NY USAMila Quebec AI Inst, Montreal, PQ, Canada
Lozano, Aurelie
Chenthamarakshan, Vijil
论文数: 0引用数: 0
h-index: 0
机构:
IBM Res, Yorktown Hts, NY USAMila Quebec AI Inst, Montreal, PQ, Canada
Chenthamarakshan, Vijil
Das, Payel
论文数: 0引用数: 0
h-index: 0
机构:
IBM Res, Yorktown Hts, NY USAMila Quebec AI Inst, Montreal, PQ, Canada
Das, Payel
Tang, Jian
论文数: 0引用数: 0
h-index: 0
机构:
Mila Quebec AI Inst, Montreal, PQ, Canada
HEC Montreal, Montreal, PQ, Canada
CIFAR AI Chair, Toronto, ON, CanadaMila Quebec AI Inst, Montreal, PQ, Canada
Tang, Jian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023),
2023,
机构:
MoleculeMind Ltd, Beijing 100084, Peoples R ChinaMoleculeMind Ltd, Beijing 100084, Peoples R China
Jing, Xiaoyang
Wu, Fandi
论文数: 0引用数: 0
h-index: 0
机构:
MoleculeMind Ltd, Beijing 100084, Peoples R China
Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R ChinaMoleculeMind Ltd, Beijing 100084, Peoples R China
Wu, Fandi
Luo, Xiao
论文数: 0引用数: 0
h-index: 0
机构:
Toyota Technol Inst Chicago, Chicago, IL 60637 USA
Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R ChinaMoleculeMind Ltd, Beijing 100084, Peoples R China
Luo, Xiao
Xu, Jinbo
论文数: 0引用数: 0
h-index: 0
机构:
MoleculeMind Ltd, Beijing 100084, Peoples R China
Toyota Technol Inst Chicago, Chicago, IL 60637 USAMoleculeMind Ltd, Beijing 100084, Peoples R China