Identification and analysis of consensus RNA motifs binding to the genome regulator CTCF

被引:10
|
作者
Kuang, Shuzhen [1 ,2 ]
Wang, Liangjiang [1 ]
机构
[1] Clemson Univ, Dept Genet & Biochem, Clemson, SC 29634 USA
[2] Clemson Univ, Dept Biol Sci, Clemson, SC 29634 USA
关键词
LONG NONCODING RNAS; CELL LUNG-CANCER; GENE-EXPRESSION; ORGANIZATION; PROGRESSION; PROTEINS; SEQUENCE; PROLIFERATION; TRANSCRIPTION; INTERACTOME;
D O I
10.1093/nargab/lqaa031
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
CCCTC-binding factor (CTCF) is a key regulator of 3D genome organization and gene expression. Recent studies suggest that RNA transcripts, mostly long non-coding RNAs (lncRNAs), can serve as locus-specific factors to bind and recruit CTCF to the chromatin. However, it remains unclear whether specific sequence patterns are shared by the CTCF-binding RNA sites, and no RNA motif has been reported so far for CTCF binding. In this study, we have developed DeepLncCTCF, a new deep learning model based on a convolutional neural network and a bidirectional long short-term memory network, to discover the RNA recognition patterns of CTCF and identify candidate lncRNAs binding to CTCF. When evaluated on two different datasets, human U2OS dataset and mouse ESC dataset, DeepLncCTCF was shown to be able to accurately predict CTCF-binding RNA sites from nucleotide sequence. By examining the sequence features learned by DeepLncCTCF, we discovered a novel RNA motif with the consensus sequence, AGAUNGGA, for potential CTCF binding in humans. Furthermore, the applicability of DeepLncCTCF was demonstrated by identifying nearly 5000 candidate lncRNAs that might bind to CTCF in the nucleus. Our results provide useful information for understanding the molecular mechanisms of CTCF function in 3D genome organization.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Computational identification of RNA motifs in genome sequences
    Narale, G
    Beaumont, J
    Rice, PA
    Schmitt, ME
    INNOVATIONS IN APPLIED ARTIFICIAL INTELLIGENCE, 2004, 3029 : 138 - 143
  • [2] Genome-wide identification and phylogenetic analysis of plant RNA binding proteins comprising both RNA recognition motifs and contiguous glycine residues
    Martin Lewinski
    Armin Hallmann
    Dorothee Staiger
    Molecular Genetics and Genomics, 2016, 291 : 763 - 773
  • [3] Genome-wide identification and phylogenetic analysis of plant RNA binding proteins comprising both RNA recognition motifs and contiguous glycine residues
    Lewinski, Martin
    Hallmann, Armin
    Staiger, Dorothee
    MOLECULAR GENETICS AND GENOMICS, 2016, 291 (02) : 763 - 773
  • [4] Ribosomal RNA Gene Transcription Mediated by the Master Genome Regulator Protein CCCTC-binding Factor (CTCF) Is Negatively Regulated by the Condensin Complex
    Huang, Kaimeng
    Jia, Jinping
    Wu, Changwei
    Yao, Mingze
    Li, Min
    Jin, Jingji
    Jiang, Cizhong
    Cai, Yong
    Pei, Duanqing
    Pan, Guangjin
    Yao, Hongjie
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2013, 288 (36) : 26067 - 26077
  • [5] Identification and analysis of putative promoter motifs in Flavivirus genome
    Somvanshi, Pallavi
    Singh, Vijai
    Seth, Prahlad Kishore
    BIOINFORMATION, 2008, 3 (04) : 162 - 167
  • [6] Identification of CTCF as a master regulator of the clustered protocadherin genes
    Golan-Mashiach, Michal
    Grunspan, Moshe
    Emmanuel, Rafi
    Gibbs-Bar, Liron
    Dikstein, Rivka
    Shapiro, Ehud
    NUCLEIC ACIDS RESEARCH, 2012, 40 (08) : 3378 - 3391
  • [7] Global alteration of CTCF binding in the cancer genome
    Fang, Celestia
    Wang, Zhenjia
    Martinez, Carlos A.
    Ntziachristos, Panagiotis
    Zang, Chongzhi
    CANCER RESEARCH, 2019, 79 (13)
  • [8] Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome
    Kim, Tae Hoon
    Abdullaev, Ziedulla K.
    Smith, Andrew D.
    Ching, Keith A.
    Loukinov, Dmitri I.
    Green, Roland D.
    Zhang, Michael Q.
    Lobanenkov, Victor V.
    Ren, Bing
    CELL, 2007, 128 (06) : 1231 - 1245
  • [9] Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated
    Choy, Mun-Kit
    Movassagh, Mehregan
    Goh, Hock-Guan
    Bennett, Martin R.
    Down, Thomas A.
    Foo, Roger S. Y.
    BMC GENOMICS, 2010, 11
  • [10] Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated
    Mun-Kit Choy
    Mehregan Movassagh
    Hock-Guan Goh
    Martin R Bennett
    Thomas A Down
    Roger SY Foo
    BMC Genomics, 11