Capacity and Expressiveness of Genomic Tandem Duplication

被引:0
|
作者
Jain, Siddharth [1 ]
Farnoud , Farzad [1 ]
Bruck, Jehoshua [1 ]
机构
[1] CALTECH, Elect Engn, Pasadena, CA 91125 USA
关键词
Expressiveness; tandem repeats; finite automata; square-free strings; REPEATS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTC (TGTG) under barC, TGTG is a tandem repeat, namely, generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplications of bounded length. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2, 3, and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size >= 4, the capacity is strictly smaller than 1, independent of the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size >= 3. Finally, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.
引用
收藏
页码:1946 / 1950
页数:5
相关论文
共 50 条
  • [21] Genomic Characterization of Partial Tandem Duplication Involving the KMT2A Gene in Adult Acute Myeloid Leukemia
    Seto, Andrew
    Downs, Gregory
    King, Olivia
    Salehi-Rad, Shabnam
    Baptista, Ana
    Chin, Kayu
    Grenier, Sylvie
    Nwachukwu, Bevoline
    Tierens, Anne
    Minden, Mark D.
    Smith, Adam C.
    Capo-Chichi, Jose-Mario
    CANCERS, 2024, 16 (09)
  • [22] STRUCTURE OF RR TANDEM DUPLICATION IN MAIZE
    DOONER, HK
    KERMICLE, JL
    GENETICS, 1971, 67 (03) : 427 - +
  • [23] New Algorithms for the Genomic Duplication Problem
    Paszek, Jaroslaw
    Gorecki, Pawel
    COMPARATIVE GENOMICS, RECOMB CG 2017, 2017, 10562 : 101 - 115
  • [24] Efficient Algorithms for Genomic Duplication Models
    Paszek, Jaroslaw
    Gorecki, Pawel
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (05) : 1515 - 1524
  • [25] Genomic evidence for adaptation by gene duplication
    Qian, Wenfeng
    Zhang, Jianzhi
    GENOME RESEARCH, 2014, 24 (08) : 1356 - 1362
  • [26] A Stochastic Model for Genomic Interspersed Duplication
    Farnoud, Farzad
    Schwartz, Moshe
    Bruck, Jehoshua
    2015 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2015, : 904 - 908
  • [27] Gait Abnormalities and Progressive Myelin Degeneration in a New Murine Model of Pelizaeus-Merzbacher Disease with Tandem Genomic Duplication
    Clark, Kristi
    Sakowski, Lauren
    Sperle, Karen
    Banser, Linda
    Landel, Carlisle P.
    Bessert, Denise A.
    Skoff, Robert P.
    Hobson, Grace M.
    JOURNAL OF NEUROSCIENCE, 2013, 33 (29): : 11788 - 11799
  • [28] Intron gain by tandem genomic duplication: a novel case in a potato gene encoding RNA-dependent RNA polymerase
    Ma, Ming-Yue
    Lan, Xin-Ran
    Niu, Deng-Ke
    PEERJ, 2016, 4
  • [29] Combined genomic and transcriptomic analysis reveals the contribution of tandem duplication genes to low-temperature adaptation in perennial ryegrass
    Wang, Wei
    Li, Xiaoning
    Fan, Shugao
    He, Yang
    Wei, Meng
    Wang, Jiayi
    Yin, Yanling
    Liu, Yanfeng
    FRONTIERS IN PLANT SCIENCE, 2023, 14
  • [30] INVERTED TANDEM DUPLICATION GENERATES A DUPLICATION DEFICIENCY OF CHROMOSOME-8P
    DILL, FJ
    SCHERTZER, M
    SANDERCOCK, J
    TISCHLER, B
    WOOD, S
    CLINICAL GENETICS, 1987, 32 (02) : 109 - 113