Enrichment of regulatory signals in conserved non-coding genomic sequence

被引:122
|
作者
Levy, S
Hannenhalli, S
Workman, C
机构
[1] Celera Genom Corp, Informat Res, Rockville, MD 20850 USA
[2] Tech Univ Denmark, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
关键词
D O I
10.1093/bioinformatics/17.10.871
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-coding sequence that exist between the assembled human genome and mouse whole genome shotgun sequencing fragments. Conserved non-coding regions identify potentially functional DNA that could be involved in transcriptional regulation. Results: Local sequence alignment methods were applied employing mouse fragments and the assembled human genome. In addition, transcription factor binding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of transcripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database. Using statistical arguments we have shown that conserved non-coding segments contain an enrichment of transcription factor binding sites when compared to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conserved non-coding segments are not extensively repeated in the genome and therefore their identification provides a rapid means of finding genes with related conserved regions, and consequently potentially related regulatory mechanism. Conserved segments in upstream regions are found to contain binding sites that are co-localized in a manner consistent with experimentally known transcription factor pairwise co-occurrences and afford the identification of novel co-occurring Transcription Factor (TF) pairs. This study provides a methodology and more evidence to suggest that conserved non-coding regions are biologically significant since they contain a statistical enrichment of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes.
引用
收藏
页码:871 / 877
页数:7
相关论文
共 50 条
  • [1] Enrichment of transcriptional regulatory sites in non-coding genomic region
    Xue, W
    Wang, J
    Shen, ZR
    Zhu, HQ
    BIOINFORMATICS, 2004, 20 (04) : 569 - 575
  • [2] Non-coding sequence retrieval system for comparative genomic analysis of gene regulatory elements
    Doh, Sung Tae
    Zhang, Yunyu
    Temple, Matthew H.
    Cai, Li
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [3] Non-coding sequence retrieval system for comparative genomic analysis of gene regulatory elements
    Sung Tae Doh
    Yunyu Zhang
    Matthew H Temple
    Li Cai
    BMC Bioinformatics, 8
  • [4] Escherichia coli non-coding regulatory regions are highly conserved
    Lamoureux, Cameron R.
    Phaneuf, Patrick, V
    Palsson, Bernhard O.
    Zielinski, Daniel C.
    NAR GENOMICS AND BIOINFORMATICS, 2024, 6 (02)
  • [5] Hundreds of conserved non-coding genomic regions are independently lost in mammals
    Hiller, Michael
    Schaar, Bruce T.
    Bejerano, Gill
    NUCLEIC ACIDS RESEARCH, 2012, 40 (22) : 11463 - 11476
  • [6] Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
    Minovitsky, Simon
    Stegmaier, Philip
    Kel, Alexander
    Kondrashov, Alexey S.
    Dubchak, Inna
    BMC GENOMICS, 2007, 8 (1)
  • [7] Regulation of IFNγ expression by a distal conserved non-coding sequence element
    Hatton, RD
    Luther, R
    Harrington, L
    Wakefield, T
    Weaver, CT
    FASEB JOURNAL, 2005, 19 (04): : A900 - A900
  • [8] IDENTIFICATION OF A CONSERVED SEQUENCE IN THE NON-CODING REGIONS OF MANY HUMAN GENES
    DONEHOWER, LA
    SLAGLE, BL
    WILDE, M
    DARLINGTON, G
    BUTEL, JS
    NUCLEIC ACIDS RESEARCH, 1989, 17 (02) : 699 - 710
  • [9] Short sequence motifs, overrepresented in mammalian conserved non-coding sequences
    Simon Minovitsky
    Philip Stegmaier
    Alexander Kel
    Alexey S Kondrashov
    Inna Dubchak
    BMC Genomics, 8
  • [10] Role of Conserved Non-Coding Regulatory Elements in LMW Glutenin Gene Expression
    Juhasz, Angela
    Makai, Szabolcs
    Sebestyen, Endre
    Laszlo Tamas
    Balazs, Ervin
    PLOS ONE, 2011, 6 (12):