Population clustering based on copy number variations detected from next generation sequencing data

被引:0
|
作者
Duan, Junbo [1 ]
Zhang, Ji-Gang [2 ]
Wan, Mingxi [1 ]
Deng, Hong-Wen [2 ,3 ]
Wang, Yu-Ping [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Dept Biomed Engn, Xian 710049, Peoples R China
[2] Tulane Univ, Dept Biostat & Bioinformat, New Orleans, LA 70118 USA
[3] Tulane Univ, Dept Biomed Engn, New Orleans, LA 70118 USA
关键词
Next generation sequencing; copy number variations; non-negative matrix factorization; 1000 Genomes Project; STRUCTURAL VARIATION; GENOME; DIVERSITY; GENOTYPE; GENES; SEQ;
D O I
10.1142/S0219720014500218
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
引用
收藏
页数:18
相关论文
共 50 条
  • [11] HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data
    Guo, Yang
    Wang, Shuzhen
    Yuan, Xiguo
    FRONTIERS IN GENETICS, 2021, 12
  • [12] SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data
    Yong Chen
    Li Zhao
    Yi Wang
    Ming Cao
    Violet Gelowani
    Mingchu Xu
    Smriti A. Agrawal
    Yumei Li
    Stephen P. Daiger
    Richard Gibbs
    Fei Wang
    Rui Chen
    BMC Bioinformatics, 18
  • [13] SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data
    Chen, Yong
    Zhao, Li
    Wang, Yi
    Cao, Ming
    Gelowani, Violet
    Xu, Mingchu
    Agrawal, Smriti A.
    Li, Yumei
    Daiger, Stephen P.
    Gibbs, Richard
    Wang, Fei
    Chen, Rui
    BMC BIOINFORMATICS, 2017, 18
  • [14] Evaluation of copy number variant detection from panel-based next-generation sequencing data
    Yao, Ruen
    Yu, Tingting
    Qing, Yanrong
    Wang, Jian
    Shen, Yiping
    MOLECULAR GENETICS & GENOMIC MEDICINE, 2019, 7 (01):
  • [15] Statistical challenges associated with detecting copy number variations with next-generation sequencing
    Teo, Shu Mei
    Pawitan, Yudi
    Ku, Chee Seng
    Chia, Kee Seng
    Salim, Agus
    BIOINFORMATICS, 2012, 28 (21) : 2711 - 2718
  • [16] Detecting copy number variation in next generation sequencing data from diagnostic gene panels
    Ashish Kumar Singh
    Maren Fridtjofsen Olsen
    Liss Anne Solberg Lavik
    Trine Vold
    Finn Drabløs
    Wenche Sjursen
    BMC Medical Genomics, 14
  • [17] Detection of Copy Number Variations from Targeted Sequencing Data
    Gandin, Ilaria
    Vuckovic, Dragana
    HUMAN HEREDITY, 2013, 76 (02) : 106 - 106
  • [18] Detecting copy number variation in next generation sequencing data from diagnostic gene panels
    Singh, Ashish Kumar
    Olsen, Maren Fridtjofsen
    Lavik, Liss Anne Solberg
    Vold, Trine
    Drablos, Finn
    Sjursen, Wenche
    BMC MEDICAL GENOMICS, 2021, 14 (01)
  • [19] Identification of Copy Number Variation in Target Capture Next Generation Sequencing Data
    Abel, H. J.
    Cottrell, C.
    AlKateb, H.
    Kulkami, S.
    Duncavage, E. J.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2012, 14 (06): : 729 - 730
  • [20] Copy Number Variation Detection Workflow using Next Generation Sequencing Data
    Dharanipragada, Prashanthi
    Parekh, Nita
    2016 INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND SYSTEMS BIOLOGY (BSB), 2016,