Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning

被引:29
|
作者
Akiyama, Manato [1 ]
Sakakibara, Yasubumi [1 ]
机构
[1] Keio Univ, Dept Biosci & Informat, Tokyo 2238522, Japan
基金
日本学术振兴会;
关键词
SECONDARY STRUCTURE PREDICTION; SEQUENCE;
D O I
10.1093/nargab/lqac012
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this 'informative base embedding' and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman-Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of O(n(2)) instead of the O(n(6)) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length n.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A computational model for RNA multiple structural alignment
    Davydov, E
    Batzoglou, S
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2004, 3109 : 254 - 269
  • [22] A computational model for RNA multiple structural alignment
    Davydov, Eugene
    Batzoglou, Serafim
    THEORETICAL COMPUTER SCIENCE, 2006, 368 (03) : 205 - 216
  • [23] Structural Alignment of RNA with Triple Helix Structure
    Wong, Thomas K. F.
    Yiu, S. M.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (04) : 365 - 378
  • [24] Deep Learning for Clustering Single-cell RNA-seq Data
    Zhu, Yuan
    Bai, Litai
    Ning, Zilin
    Fu, Wenfei
    Liu, Jie
    Jiang, Linfeng
    Fei, Shihuang
    Gong, Shiyun
    Lu, Lulu
    Deng, Minghua
    Yi, Ming
    CURRENT BIOINFORMATICS, 2024, 19 (03) : 193 - 210
  • [25] Network embedding-based representation learning for single cell RNA-seq data
    Li, Xiangyu
    Chen, Weizheng
    Chen, Yang
    Zhang, Xuegong
    Gu, Jin
    Zhang, Michael Q.
    NUCLEIC ACIDS RESEARCH, 2017, 45 (19)
  • [26] Deep learning enables accurate alignment of single cell RNA-seq data
    Zhong, Yuanke
    Li, Jing
    Liu, Jie
    Zheng, Yan
    Shang, Xuequn
    Hu, Jialu
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 778 - 781
  • [27] Wasserstein Embedding Learning for Deep Clustering: A Generative Approach
    Cai, Jinyu
    Zhang, Yunhe
    Wang, Shiping
    Fan, Jicong
    Guo, Wenzhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7567 - 7580
  • [28] Spectral Clustering Joint Deep Embedding Learning by Autoencoder
    Ye, Xiucai
    Wang, Chunhao
    Imakura, Akira
    Sakurai, Tetsuya
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] Learning Embedding Space for Clustering From Deep Representations
    Dahal, Paras
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3747 - 3755
  • [30] Online Deep Clustering for Unsupervised Representation Learning
    Zhan, Xiaohang
    Xie, Jiahao
    Liu, Ziwei
    Ong, Yew-Soon
    Loy, Chen Change
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6687 - 6696