Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions

被引:32
|
作者
Yi, Hai-Cheng [1 ,2 ]
You, Zhu-Hong [1 ]
Cheng, Li [1 ]
Zhou, Xi [1 ]
Jiang, Tong-Hai [1 ]
Li, Xiao [1 ]
Wang, Yan-Bin [1 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Distribution representation; Natural language processing; Word2vec; RNA-protein interaction; LONG NONCODING RNA;
D O I
10.1016/j.csbj.2019.11.004
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The long noncoding RNAs (lncRNAs) are ubiquitous in organisms and play crucial role in a variety of biological processes and complex diseases. Emerging evidences suggest that lncRNAs interact with corresponding proteins to perform their regulatory functions. Therefore, identifying interacting lncRNA-protein pairs is the first step in understanding the function and mechanism of lncRNA. Since it is time-consuming and expensive to determine lncRNA-protein interactions by high-throughput experiments, more robust and accurate computational methods need to be developed. In this study, we developed a new sequence distributed representation learning based method for potential lncRNA-Protein Interactions Prediction, named LPI-Pred, which is inspired by the similarity between natural language and biological sequences. More specifically, lncRNA and protein sequences were divided into k-mer segmentation, which can be regard as "word" in natural language processing. Then, we trained out the RNA2vec and Pro2vec model using word2vec and human genome-wide lncRNA and protein sequences to mine distribution representation of RNA and protein. Then, the dimension of complex features is reduced by using feature selection based on Gini information impurity measure. Finally, these discriminative features are used to train a Random Forest classifier to predict lncRNA-protein interactions. Five-fold cross-validation was adopted to evaluate the performance of LPI-Pred on three benchmark datasets, including RPI369, RPI488 and RPI2241. The results demonstrate that LPI-Pred can be a useful tool to provide reliable guidance for biological research. (C) 2019 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:20 / 26
页数:7
相关论文
共 50 条
  • [21] Deciphering LncRNA-protein interactions using docking complexes
    Suravajhala, Renuka
    Gupta, Sonal
    Kumar, Narayan
    Suravajhala, Prashanth
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2022, 40 (08): : 3769 - 3776
  • [22] Predicting lncRNA-protein interactions with bipartite graph embedding and deep graph neural networks
    Ma, Yuzhou
    Zhang, Han
    Jin, Chen
    Kang, Chuanze
    FRONTIERS IN GENETICS, 2023, 14
  • [23] Protocol for detecting lncRNA-protein interactions in vitro tRSA RNA pull-down
    Jiang, Liyun
    Yang, Jun
    He, Reqing
    Zhu, Youlin
    Wang, Dong
    STAR PROTOCOLS, 2024, 5 (01):
  • [24] RNA immunoprecipitation reveals lncRNA-protein interactions in basal-like breast cancer
    Northwood, K.
    Saunus, J.
    Milevskiy, M.
    Lakhani, S.
    Brown, M.
    CANCER RESEARCH, 2017, 77
  • [25] Prediction of interactions between lncRNA and protein by using relevance search in a heterogeneous lncRNA-protein network
    Yang, Jianghong
    Li, Ao
    Ge, Mengqu
    Wang, Minghui
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 8540 - 8544
  • [26] Predicting lncRNA-protein interactions using a hybrid deep learning model with dinucleotide-codon fusion feature encoding
    Li, Tan
    Li, Mengshan
    Fu, Yu
    Li, Yelin
    Zhu, Jihong
    Guan, Lixin
    BMC GENOMICS, 2024, 25 (01):
  • [27] LPI-BLS: Predicting lncRNA-protein interactions with a broad learning system-based stacked ensemble classifier
    Fan, Xiao-Nan
    Zhang, Shao-Wu
    NEUROCOMPUTING, 2019, 370 : 88 - 93
  • [28] LPI-CSFFR: Combining serial fusion with feature reuse for predicting LncRNA-protein interactions
    Huang, Xiaoqian
    Shi, Yi
    Yan, Jing
    Qu, Wenyan
    Li, Xiaoyi
    Tan, Jianjun
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2022, 99
  • [29] Probing lncRNA-Protein Interactions: Data Repositories, Models, and Algorithms
    Peng, Lihong
    Liu, Fuxing
    Yang, Jialiang
    Liu, Xiaojun
    Meng, Yajie
    Deng, Xiaojun
    Peng, Cheng
    Tian, Geng
    Zhou, Liqian
    FRONTIERS IN GENETICS, 2020, 10
  • [30] Fusing multiple protein-protein similarity networks to effectively predict lncRNA-protein interactions
    Xiaoxiong Zheng
    Yang Wang
    Kai Tian
    Jiaogen Zhou
    Jihong Guan
    Libo Luo
    Shuigeng Zhou
    BMC Bioinformatics, 18