Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions

被引:32
|
作者
Yi, Hai-Cheng [1 ,2 ]
You, Zhu-Hong [1 ]
Cheng, Li [1 ]
Zhou, Xi [1 ]
Jiang, Tong-Hai [1 ]
Li, Xiao [1 ]
Wang, Yan-Bin [1 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Distribution representation; Natural language processing; Word2vec; RNA-protein interaction; LONG NONCODING RNA;
D O I
10.1016/j.csbj.2019.11.004
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The long noncoding RNAs (lncRNAs) are ubiquitous in organisms and play crucial role in a variety of biological processes and complex diseases. Emerging evidences suggest that lncRNAs interact with corresponding proteins to perform their regulatory functions. Therefore, identifying interacting lncRNA-protein pairs is the first step in understanding the function and mechanism of lncRNA. Since it is time-consuming and expensive to determine lncRNA-protein interactions by high-throughput experiments, more robust and accurate computational methods need to be developed. In this study, we developed a new sequence distributed representation learning based method for potential lncRNA-Protein Interactions Prediction, named LPI-Pred, which is inspired by the similarity between natural language and biological sequences. More specifically, lncRNA and protein sequences were divided into k-mer segmentation, which can be regard as "word" in natural language processing. Then, we trained out the RNA2vec and Pro2vec model using word2vec and human genome-wide lncRNA and protein sequences to mine distribution representation of RNA and protein. Then, the dimension of complex features is reduced by using feature selection based on Gini information impurity measure. Finally, these discriminative features are used to train a Random Forest classifier to predict lncRNA-protein interactions. Five-fold cross-validation was adopted to evaluate the performance of LPI-Pred on three benchmark datasets, including RPI369, RPI488 and RPI2241. The results demonstrate that LPI-Pred can be a useful tool to provide reliable guidance for biological research. (C) 2019 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:20 / 26
页数:7
相关论文
共 50 条
  • [1] Predicting lncRNA-protein Interactions by Machine Learning Methods: A Review
    Liu, Zhi-Ping
    CURRENT BIOINFORMATICS, 2020, 15 (08) : 831 - 840
  • [2] Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network
    Pan, Xiaoyong
    Shen, Hong-Bin
    NEUROCOMPUTING, 2018, 305 : 51 - 58
  • [3] Predicting lncRNA-Protein Interactions by Heterogenous Network Embedding
    Zhao, Guoqing
    Li, Pengpai
    Qiao, Xu
    Han, Xianhua
    Liu, Zhi-Ping
    FRONTIERS IN GENETICS, 2022, 12
  • [4] Predicting lncRNA-Protein Interactions Based on Protein-Protein Similarity Network Fusion
    Zheng, Xiaoxiang
    Tian, Kai
    Wang, Yang
    Guan, Jihong
    Zhou, Shuigeng
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2016, 2016, 9683 : 321 - 322
  • [5] Predicting lncRNA-Protein Interactions With miRNAs as Mediators in a Heterogeneous Network Model
    Zhou, Yuan-Ke
    Shen, Zi-Ang
    Yu, Han
    Luo, Tao
    Gao, Yang
    Du, Pu-Feng
    FRONTIERS IN GENETICS, 2020, 10
  • [6] PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions
    Haoran Zhou
    Jael Sanyanda Wekesa
    Yushi Luan
    Jun Meng
    BMC Bioinformatics, 22
  • [7] Computational Prediction of lncRNA-Protein Interactions using Machine learning
    Mushtaq, Muhammad
    Naveed, Hammad
    Khalid, Zoya
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 2100 - 2103
  • [8] Relevance search for predicting lncRNA-protein interactions based on heterogeneous network
    Yang, Jianghong
    Li, Ao
    Ge, Mengqu
    Wang, Minghui
    NEUROCOMPUTING, 2016, 206 : 81 - 88
  • [9] PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions
    Zhou, Haoran
    Wekesa, Jael Sanyanda
    Luan, Yushi
    Meng, Jun
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [10] Function of lncRNAs and approaches to lncRNA-protein interactions
    Zhu JuanJuan
    Fu HanJiang
    Wu YongGe
    Zheng XiaoFei
    SCIENCE CHINA-LIFE SCIENCES, 2013, 56 (10) : 876 - 885