Capacity and Expressiveness of Genomic Tandem Duplication

被引:0
|
作者
Jain, Siddharth [1 ]
Farnoud , Farzad [1 ]
Bruck, Jehoshua [1 ]
机构
[1] CALTECH, Elect Engn, Pasadena, CA 91125 USA
关键词
Expressiveness; tandem repeats; finite automata; square-free strings; REPEATS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTC (TGTG) under barC, TGTG is a tandem repeat, namely, generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplications of bounded length. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2, 3, and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size >= 4, the capacity is strictly smaller than 1, independent of the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size >= 3. Finally, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.
引用
收藏
页码:1946 / 1950
页数:5
相关论文
共 50 条
  • [41] TANDEM DUPLICATION OF PROXIMAL-5Q
    ROJASMARTINEZ, A
    GARCIACRUZ, D
    MEDINA, C
    MOLLER, M
    RESTREPO, CM
    RIVERA, H
    ANNALES DE GENETIQUE, 1990, 33 (04): : 228 - 230
  • [42] Genomic duplication problems for unrooted gene trees
    Paszek, Jaroslaw
    Gorecki, Pawel
    BMC GENOMICS, 2016, 17
  • [43] Genomic duplication, fractionation and the origin of regulatory novelty
    Langham, RJ
    Walsh, J
    Dunn, M
    Ko, C
    Goff, SA
    Freeling, M
    GENETICS, 2004, 166 (02) : 935 - 945
  • [44] Gene duplication: The genomic trade in spare parts
    Hurles, M
    PLOS BIOLOGY, 2004, 2 (07): : 900 - 904
  • [45] UGT genomic diversity: beyond gene duplication
    Guillemette, Chantal
    Levesque, Eric
    Harvey, Mario
    Bellemare, Judith
    Menard, Vincent
    DRUG METABOLISM REVIEWS, 2010, 42 (01) : 24 - 44
  • [46] A segmental genomic duplication generates a functional intron
    Hellsten, Uffe
    Aspden, Julie L.
    Rio, Donald C.
    Rokhsar, Daniel S.
    NATURE COMMUNICATIONS, 2011, 2
  • [47] DNA replication - Genomic views of genome duplication
    Stillman, B
    SCIENCE, 2001, 294 (5550) : 2301 - +
  • [48] A segmental genomic duplication generates a functional intron
    Uffe Hellsten
    Julie L. Aspden
    Donald C. Rio
    Daniel S. Rokhsar
    Nature Communications, 2
  • [49] The Capacity of String-Duplication Systems
    Farnoud , Farzad
    Schwartz, Moshe
    Bruck, Jehoshua
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2016, 62 (02) : 811 - 824
  • [50] The Capacity of String-Duplication Systems
    Farnoud , Farzad
    Schwartz, Moshe
    Bruck, Jehoshua
    2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 1301 - 1305