Capacity and Expressiveness of Genomic Tandem Duplication

被引:0
|
作者
Jain, Siddharth [1 ]
Farnoud , Farzad [1 ]
Bruck, Jehoshua [1 ]
机构
[1] CALTECH, Elect Engn, Pasadena, CA 91125 USA
关键词
Expressiveness; tandem repeats; finite automata; square-free strings; REPEATS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTC (TGTG) under barC, TGTG is a tandem repeat, namely, generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplications of bounded length. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2, 3, and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size >= 4, the capacity is strictly smaller than 1, independent of the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size >= 3. Finally, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.
引用
收藏
页码:1946 / 1950
页数:5
相关论文
共 50 条
  • [1] Capacity and Expressiveness of Genomic Tandem Duplication
    Jain, Siddharth
    Farnoud , Farzad
    Bruck, Jehoshua
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (10) : 6129 - 6138
  • [2] Genomic duplication
    Weitzman J.B.
    Genome Biology, 3 (1)
  • [3] The combinatorics of tandem duplication
    Penso-Dolfin, L.
    Wu, T.
    Greenman, C. D.
    DISCRETE APPLIED MATHEMATICS, 2015, 194 : 1 - 22
  • [4] TANDEM DUPLICATION IN MOUSE
    RUSSELL, LB
    RUSSELL, WL
    CACHEIRO, NLA
    VAUGHAN, CM
    POPP, RA
    JACOBSON, KB
    GENETICS, 1975, 80 (03) : S71 - S71
  • [5] Genomic organization, intragenic tandem duplication, and expression analysis of chicken TGFBR2 gene
    Ning, Bolin
    Huang, Jiaxin
    Xu, Haidong
    Lou, Yuqi
    Wang, Weishi
    Mu, Fang
    Yan, Xiaohong
    Li, Hui
    Wang, Ning
    POULTRY SCIENCE, 2022, 101 (12)
  • [6] On counting tandem duplication trees
    Yang, YL
    Zhang, LX
    MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (06) : 1160 - 1163
  • [7] The combinatorics of tandem duplication trees
    Gascuel, O
    Hendy, MD
    Jean-Marie, A
    McLachlan, R
    SYSTEMATIC BIOLOGY, 2003, 52 (01) : 110 - 118
  • [8] Sequence alignment with tandem duplication
    Benson, G
    JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) : 351 - 367
  • [9] Sorting signed permutations by tandem duplication random loss and inverse tandem duplication random loss
    Schmidt, Bruno J.
    Hartmann, Tom
    Stadler, Peter F.
    ADVANCES IN APPLIED MATHEMATICS, 2024, 161
  • [10] Genomic characterization of a TP53tandem duplication in a pediatric patient with Li-Fraumeni syndrome
    Xu, Feng
    Aref-Eshghi, Erfan
    Wu, Jinhua
    Schubert, Jeffrey
    Patel, Maha
    Fan, Zhiqian
    Cao, Kajia
    Long, Ariel
    Denenberg, Elizabeth
    Fanning, Elizabeth
    Wilmoth, Donna
    Wertheim, Gerald
    Luo, Minjie
    Conlin, Laura
    Bhatti, Tricia
    Dain, Aleksandra
    Zelley, Kristin
    Balamuth, Naomi
    MacFarland, Suzanne
    Li, Marilyn
    Zhong, Yiming
    GENETICS IN MEDICINE, 2022, 24 (03) : S182 - S182