Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs

被引:5
|
作者
Holden, Lindsay A. [1 ]
Arumilli, Meharji [2 ,3 ,4 ]
Hytonen, Marjo K. [2 ,3 ,4 ]
Hundi, Sruthi [2 ,3 ,4 ]
Salojarvi, Jarkko [5 ,6 ]
Brown, Kim H. [1 ]
Lohi, Hannes [2 ,3 ,4 ]
机构
[1] Portland State Univ, Dept Biol, Portland, OR 97207 USA
[2] Univ Helsinki, Res Programs Unit, Mol Neurol, Helsinki, Finland
[3] Univ Helsinki, Dept Vet Biosci, Helsinki, Finland
[4] Folkhalsan Inst Genet, Helsinki, Finland
[5] Univ Helsinki, Fac Biol & Environm Sci, Res Programme Individuals & Populat, Helsinki, Finland
[6] Nanyang Technol Univ, Sch Biol Sci, Singapore, Singapore
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
基金
芬兰科学院; 欧洲研究理事会;
关键词
LINKAGE DISEQUILIBRIUM; ALIGNMENT;
D O I
10.1038/s41598-018-29190-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
    Lindsay A. Holden
    Meharji Arumilli
    Marjo K. Hytönen
    Sruthi Hundi
    Jarkko Salojärvi
    Kim H. Brown
    Hannes Lohi
    Scientific Reports, 8
  • [2] Author Correction: Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
    Lindsay A. Holden
    Meharji Arumilli
    Marjo K. Hytönen
    Sruthi Hundi
    Jarkko Salojärvi
    Kim H. Brown
    Hannes Lohi
    Scientific Reports, 8
  • [3] Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs (vol 8, 10862, 2018)
    Holden, Lindsay A.
    Arumilli, Meharji
    Hytonen, Marjo K.
    Hundi, Sruthi
    Salojarvi, Jarkko
    Brown, Kim H.
    Lohi, Hannes
    SCIENTIFIC REPORTS, 2018, 8
  • [4] Pseudo-De Novo Assembly and Analysis of Unmapped Genome Sequence Reads in Wild Zebrafish Reveal Novel Gene Content
    Faber-Hammond, Joshua J.
    Brown, Kim H.
    ZEBRAFISH, 2016, 13 (02) : 95 - 102
  • [5] Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads
    Faber-Hammond, Joshua J.
    Brown, Kim H.
    HUMAN GENETICS, 2016, 135 (07) : 727 - 740
  • [6] Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads
    Joshua J. Faber-Hammond
    Kim H. Brown
    Human Genetics, 2016, 135 : 727 - 740
  • [7] Multiple Sequence Assembly from Reads Alignable to a Common Reference Genome
    Peng, Qian
    Smith, Andrew D.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (05) : 1283 - 1295
  • [8] Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly
    Lam, Ernest T.
    Hastie, Alex
    Lin, Chin
    Ehrlich, Dean
    Das, Somes K.
    Austin, Michael D.
    Deshpande, Paru
    Cao, Han
    Nagarajan, Niranjan
    Xiao, Ming
    Kwok, Pui-Yan
    NATURE BIOTECHNOLOGY, 2012, 30 (08) : 771 - 776
  • [9] Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly
    Ernest T Lam
    Alex Hastie
    Chin Lin
    Dean Ehrlich
    Somes K Das
    Michael D Austin
    Paru Deshpande
    Han Cao
    Niranjan Nagarajan
    Ming Xiao
    Pui-Yan Kwok
    Nature Biotechnology, 2012, 30 : 771 - 776
  • [10] Targeted Assembly of Short Sequence Reads
    Warren, Rene L.
    Holt, Robert A.
    PLOS ONE, 2011, 6 (05):