CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data

被引:0
|
作者
Soylev, Arda [1 ,2 ]
Cokoglu, Sevim Seda [3 ]
Koptekin, Dilek [4 ]
Alkan, Can [5 ]
Somel, Mehmet [3 ]
机构
[1] Konya Food & Agr Univ, Dept Comp Engn, Konya, Turkey
[2] Heinrich Heine Univ, Med Fac, Inst Med Biometry & Bioinformat, Dusseldorf, Germany
[3] Middle East Tech Univ, Dept Biol, Ankara, Turkey
[4] Middle East Tech Univ, Grad Sch Informat, Dept Hlth Informat, Ankara, Turkey
[5] Bilkent Univ, Dept Comp Engn, Ankara, Turkey
基金
欧洲研究理事会;
关键词
STRUCTURAL VARIATION; ADAPTIVE EVOLUTION; EARLY FARMERS; ADMIXTURE; DNA; DISCOVERY; HISTORY; POLYMORPHISM; FRAMEWORK; DELETION;
D O I
10.1371/journal.pcbi.1010788
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1x) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at >= 1x, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44x-26x (median 4x) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage. Author summary In parallel with developments in genomic technologies over the last decades, ancient genomics opened a new era in understanding the evolutionary history of populations and species. However, the field still needs novel computational methods for accurate and effective use of ancient genome data, which is mostly low-coverage and more challenging to analyse than modern-day genomes. Single nucleotide polymorphisms (SNPs), to date, have yet been the main source of information analysed in ancient genome studies. This is despite copy number variants (CNVs) harboring at least as much information as SNPs, especially with respect to natural selection. Here we developed CONGA, an algorithm for genotyping deletions and duplications in low-coverage genomes. We assessed its accuracy using simulations (with ancient-like data), and also studied its performance among 71 real ancient human genomes from different laboratories. We found that the common practice of authors filtering their ancient genome data before publishing prevents the reliable identification of duplications. Meanwhile, large (>1,000 base-pair) deletions can be detected even at quite low coverage (e.g. 0.5x). Deletions called in ancient genomes reflect population history and also show signs of negative selection.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] ACE: absolute copy number estimation from low-coverage whole-genome sequencing data
    Poell, Jos B.
    Mendeville, Matias
    Sie, Daoud
    Brink, Arjen
    Brakenhoff, Ruud H.
    Ylstra, Bauke
    [J]. BIOINFORMATICS, 2019, 35 (16) : 2847 - 2849
  • [2] dpGMM: A Dirichlet Process Gaussian Mixture Model for Copy Number Variation Detection in Low-Coverage Whole-Genome Sequencing Data
    Li, Yaoyao
    Zhang, Junying
    Yuan, Xiguo
    Li, Junping
    [J]. IEEE ACCESS, 2020, 8 : 27973 - 27985
  • [3] SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data
    Blischak, Paul D.
    Kubatko, Laura S.
    Wolfe, Andrea D.
    [J]. BIOINFORMATICS, 2018, 34 (03) : 407 - 415
  • [4] Induction and recovery of copy number variation in banana through gamma irradiation and low-coverage whole-genome sequencing
    Datta, Sneha
    Jankowicz-Cieslak, Joanna
    Nielen, Stephan
    Ingelbrecht, Ivan
    Till, Bradley J.
    [J]. PLANT BIOTECHNOLOGY JOURNAL, 2018, 16 (09) : 1644 - 1653
  • [5] Reveel: large-scale population genotyping using low-coverage sequencing data
    Huang, Lin
    Wang, Bo
    Chen, Ruitang
    Bercovici, Sivan
    Batzoglou, Serafim
    [J]. BIOINFORMATICS, 2016, 32 (11) : 1686 - 1696
  • [6] SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples
    Le, Si Quang
    Durbin, Richard
    [J]. GENOME RESEARCH, 2011, 21 (06) : 952 - 960
  • [7] Low-Coverage Sequencing of Urine Sediment DNA for Detection of Copy Number Aberrations in Bladder Cancer
    Cai, Yun-xi
    Yang, Xu
    Lin, Sheng
    Xu, Ya-wen
    Zhu, Shan-wen
    Fan, Dong-mei
    Zhao, Min
    Zhang, Yuan-bin
    Yang, Xue-xi
    Li, Xin
    [J]. CANCER MANAGEMENT AND RESEARCH, 2021, 13 : 1943 - 1953
  • [8] Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
    Khalil, Ahmed Ibrahim Samir
    Khyriem, Costerwell
    Chattopadhyay, Anupam
    Sanyal, Amartya
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [9] Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
    Ahmed Ibrahim Samir Khalil
    Costerwell Khyriem
    Anupam Chattopadhyay
    Amartya Sanyal
    [J]. BMC Bioinformatics, 21
  • [10] Washout DNA copy number analysis by low-coverage whole genome sequencing for assessment of thyroid FNAs
    Wu, Linfeng
    Zhou, Yuying
    Guan, Yaoyao
    Xiao, Rongyao
    Cai, Jiaohao
    Chen, Weike
    Zheng, Mengmeng
    Sun, Kaiting
    Chen, Chao
    Huang, Guanli
    Zhang, Xiaogang
    Zhai, Lijuan
    Qian, Ziliang
    Shen, Shu-rong
    [J]. FRONTIERS IN ENDOCRINOLOGY, 2022, 13