Tree polynomials identify a link between co-transcriptional R-loops and nascent RNA folding

被引:0
|
作者
Liu, Pengyu [1 ]
Lusk, Jacob [1 ]
Jonoska, Natasa [2 ]
Vazquez, Mariel [1 ,3 ]
机构
[1] Univ Calif Davis, Dept Microbiol & Mol Genet, Davis, CA 95616 USA
[2] Univ S Florida, Dept Math & Stat, Tampa, FL USA
[3] Univ Calif Davis, Dept Math, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
DNA sequences - Polynomials - Trees (mathematics);
D O I
10.1371/journal.pcbi.1012669
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
R-loops are a class of non-canonical nucleic acid structures that typically form during transcription when the nascent RNA hybridizes the DNA template strand, leaving the non-template DNA strand unpaired. These structures are abundant in nature and play important physiological and pathological roles. Recent research shows that DNA sequence and topology affect R-loops, yet it remains unclear how these and other factors contribute to R-loop formation. In this work, we investigate the link between nascent RNA folding and the formation of R-loops. We introduce tree-polynomials, a new class of representations of RNA secondary structures. A tree-polynomial representation consists of a rooted tree associated with an RNA secondary structure together with a polynomial that is uniquely identified with the rooted tree. Tree-polynomials enable accurate, interpretable and efficient data analysis of RNA secondary structures without pseudoknots. We develop a computational pipeline for investigating and predicting R-loop formation from a genomic sequence. The pipeline obtains nascent RNA secondary structures from a co-transcriptional RNA folding software, and computes the tree-polynomial representations of the structures. By applying this pipeline to plasmid sequences that contain R-loop forming genes, we establish a strong correlation between the coefficient sums of tree-polynomials and the experimental probability of R-loop formation. Such strong correlation indicates that the pipeline can be used for accurate R-loop prediction. Furthermore, the interpretability of tree-polynomials allows us to characterize the features of RNA secondary structure associated with R-loop formation. In particular, we identify that branches with short stems separated by bulges and interior loops are associated with R-loops.
引用
收藏
页数:24
相关论文
共 31 条