Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs

被引:5
|
作者
Holden, Lindsay A. [1 ]
Arumilli, Meharji [2 ,3 ,4 ]
Hytonen, Marjo K. [2 ,3 ,4 ]
Hundi, Sruthi [2 ,3 ,4 ]
Salojarvi, Jarkko [5 ,6 ]
Brown, Kim H. [1 ]
Lohi, Hannes [2 ,3 ,4 ]
机构
[1] Portland State Univ, Dept Biol, Portland, OR 97207 USA
[2] Univ Helsinki, Res Programs Unit, Mol Neurol, Helsinki, Finland
[3] Univ Helsinki, Dept Vet Biosci, Helsinki, Finland
[4] Folkhalsan Inst Genet, Helsinki, Finland
[5] Univ Helsinki, Fac Biol & Environm Sci, Res Programme Individuals & Populat, Helsinki, Finland
[6] Nanyang Technol Univ, Sch Biol Sci, Singapore, Singapore
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
基金
芬兰科学院; 欧洲研究理事会;
关键词
LINKAGE DISEQUILIBRIUM; ALIGNMENT;
D O I
10.1038/s41598-018-29190-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Sequence verification of synthetic DNA by assembly of sequencing reads
    Wilson, Mandy L.
    Cai, Yizhi
    Hanlon, Regina
    Taylor, Samantha
    Chevreux, Bastien
    Setubal, Joao C.
    Tyler, Brett M.
    Peccoud, Jean
    NUCLEIC ACIDS RESEARCH, 2013, 41 (01)
  • [22] A Novel Assembly Sequence Design Mechanism for Assembly Sequence Planning
    Zhang, Nan
    Liu, Zhenyu
    Qiu, Chan
    Tan, Jianrong
    2021 THE 8TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND APPLICATIONS-EUROPE, ICIEA 2021-EUROPE, 2021, : 109 - 114
  • [23] What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
    Whitacre, Lynsey K.
    Tizioto, Polyana C.
    Kim, JaeWoo
    Sonstegard, Tad S.
    Schroeder, Steven G.
    Alexander, Leeson J.
    Medrano, Juan F.
    Schnabel, Robert D.
    Taylor, Jeremy F.
    Decker, Jared E.
    BMC GENOMICS, 2015, 16
  • [24] What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
    Lynsey K. Whitacre
    Polyana C. Tizioto
    JaeWoo Kim
    Tad S. Sonstegard
    Steven G. Schroeder
    Leeson J. Alexander
    Juan F. Medrano
    Robert D. Schnabel
    Jeremy F. Taylor
    Jared E. Decker
    BMC Genomics, 16
  • [25] Erratum: Sense from sequence reads: methods for alignment and assembly
    Paul Flicek
    Ewan Birney
    Nature Methods, 2010, 7 : 479 - 479
  • [26] Evaluation of CircRNA Sequence Assembly Methods Using Long Reads
    Zhang, Jingjing
    Hossain, Md. Tofazzal
    Liu, Weiguo
    Peng, Yin
    Pan, Yi
    Wei, Yanjie
    FRONTIERS IN GENETICS, 2022, 13
  • [27] Complete Genome Sequence of a Novel Picornavirus, Canine Picornavirus, Discovered in Dogs
    Woo, Patrick C. Y.
    Lau, Susanna K. P.
    Choi, Garnet K. Y.
    Yip, Cyril C. Y.
    Huang, Yi
    Tsoi, Hoi-Wah
    Yuen, Kwok-Yung
    JOURNAL OF VIROLOGY, 2012, 86 (06) : 3402 - 3403
  • [28] A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism
    Buckley, Reuben M.
    Davis, Brian W.
    Brashear, Wesley A.
    Farias, Fabiana H. G.
    Kuroki, Kei
    Graves, Tina
    Hillier, LaDeana W.
    Kremitzki, Milinn
    Li, Gang
    Middleton, Rondo P.
    Minx, Patrick
    Tomlinson, Chad
    Lyons, Leslie A.
    Murphy, William J.
    Warren, Wesley C.
    PLOS GENETICS, 2020, 16 (10):
  • [29] Long human-mouse sequence alignments reveal novel regulatory elements: A reason to sequence the mouse genome
    Hardison, RC
    Oeltjen, J
    Miller, W
    GENOME RESEARCH, 1997, 7 (10): : 959 - 966
  • [30] Genome Sequence and Assembly of Bos indicus
    Canavez, Flavio C.
    Luche, Douglas D.
    Stothard, Paul
    Leite, Katia R. M.
    Sousa-Canavez, Juliana M.
    Plastow, Graham
    Meidanis, Joao
    Souza, Maria Angelica
    Feijao, Pedro
    Moore, Steve S.
    Camara-Lopes, Luiz H.
    JOURNAL OF HEREDITY, 2012, 103 (03) : 342 - 348