Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs

被引:5
|
作者
Holden, Lindsay A. [1 ]
Arumilli, Meharji [2 ,3 ,4 ]
Hytonen, Marjo K. [2 ,3 ,4 ]
Hundi, Sruthi [2 ,3 ,4 ]
Salojarvi, Jarkko [5 ,6 ]
Brown, Kim H. [1 ]
Lohi, Hannes [2 ,3 ,4 ]
机构
[1] Portland State Univ, Dept Biol, Portland, OR 97207 USA
[2] Univ Helsinki, Res Programs Unit, Mol Neurol, Helsinki, Finland
[3] Univ Helsinki, Dept Vet Biosci, Helsinki, Finland
[4] Folkhalsan Inst Genet, Helsinki, Finland
[5] Univ Helsinki, Fac Biol & Environm Sci, Res Programme Individuals & Populat, Helsinki, Finland
[6] Nanyang Technol Univ, Sch Biol Sci, Singapore, Singapore
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
基金
芬兰科学院; 欧洲研究理事会;
关键词
LINKAGE DISEQUILIBRIUM; ALIGNMENT;
D O I
10.1038/s41598-018-29190-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Genome sequence assembly: Algorithms and issues
    Pop, M
    Salzberg, SL
    Shumway, M
    COMPUTER, 2002, 35 (07) : 47 - +
  • [32] The Theory and Practice of Genome Sequence Assembly
    Simpson, Jared T.
    Pop, Mihai
    ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 16, 2015, 16 : 153 - 172
  • [33] Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads
    Mourier, Tobias
    Mollerup, Sarah
    Vinner, Lasse
    Hansen, Thomas Arn
    Kjartansdottir, Kristn Ros
    Froslev, Tobias Guldberg
    Boutrup, Torsten Snogdal
    Nielsen, Lars Peter
    Willerslev, Eske
    Hansen, Anders J.
    SCIENTIFIC REPORTS, 2015, 5
  • [34] Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads
    Tobias Mourier
    Sarah Mollerup
    Lasse Vinner
    Thomas Arn Hansen
    Kristín Rós Kjartansdóttir
    Tobias Guldberg Frøslev
    Torsten Snogdal Boutrup
    Lars Peter Nielsen
    Eske Willerslev
    Anders J. Hansen
    Scientific Reports, 5
  • [35] Functional characterisation of genome sequence variation
    Nasim, Talat
    Gouri, A. G.
    Patel, B.
    Trembath, R. C.
    JOURNAL OF MEDICAL GENETICS, 2006, 43 : S87 - S87
  • [37] Assembly and Analysis of the Genome Sequence of the Yeast Brettanomyces naardenensis CBS 7540
    Tiukova, Ievgeniia A.
    Jiang, Huifeng
    Dainat, Jacques
    Hoeppner, Marc P.
    Lantz, Henrik
    Piskur, Jure
    Sandgren, Mats
    Nielsen, Jens
    Gu, Zhenglong
    Passoth, Volkmar
    MICROORGANISMS, 2019, 7 (11)
  • [38] Assembly and phylogenetic analysis of the complete chloroplast genome sequence of Actinidia setosa
    Lin, Haifeng
    Jiang, Ling
    Zhang, Fuquan
    Bai, Di
    MITOCHONDRIAL DNA PART B-RESOURCES, 2019, 4 (02): : 3679 - 3680
  • [39] Nature-inspired novel Cuckoo Search Algorithm for genome sequence assembly
    R INDUMATHY
    S UMA MAHESWARI
    G SUBASHINI
    Sadhana, 2015, 40 : 1 - 14
  • [40] Nature-inspired novel Cuckoo Search Algorithm for genome sequence assembly
    Indumathy, R.
    Maheswari, S. Uma
    Subashini, G.
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2015, 40 (01): : 1 - 14