Population clustering based on copy number variations detected from next generation sequencing data

被引:0
|
作者
Duan, Junbo [1 ]
Zhang, Ji-Gang [2 ]
Wan, Mingxi [1 ]
Deng, Hong-Wen [2 ,3 ]
Wang, Yu-Ping [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Dept Biomed Engn, Xian 710049, Peoples R China
[2] Tulane Univ, Dept Biostat & Bioinformat, New Orleans, LA 70118 USA
[3] Tulane Univ, Dept Biomed Engn, New Orleans, LA 70118 USA
关键词
Next generation sequencing; copy number variations; non-negative matrix factorization; 1000 Genomes Project; STRUCTURAL VARIATION; GENOME; DIVERSITY; GENOTYPE; GENES; SEQ;
D O I
10.1142/S0219720014500218
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Detection of common copy number variation with application to population clustering from next generation sequencing data
    Duan, Junbo
    Zhang, Ji-Gang
    Deng, Hong-Wen
    Wang, Yu-Ping
    2012 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2012, : 1246 - 1249
  • [2] A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data
    Liu, Guojun
    Zhang, Junying
    FRONTIERS IN GENETICS, 2021, 12
  • [3] Detection of copy number variations based on a local distance using next-generation sequencing data
    Liu, Guojun
    Yang, Hongzhi
    He, Zongzhen
    FRONTIERS IN GENETICS, 2023, 14
  • [4] A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
    Xie, Kun
    Tian, Ye
    Yuan, Xiguo
    FRONTIERS IN GENETICS, 2021, 11
  • [5] Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data
    Yuan, Xiguo
    Zhang, Junying
    Yang, Liying
    Bai, Jun
    Fan, Peizhen
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2018, 17 (01) : 12 - 20
  • [6] MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data
    Zhao, Haiyong
    Huang, Tihao
    Li, Junqing
    Liu, Guojun
    Yuan, Xiguo
    FRONTIERS IN GENETICS, 2020, 11
  • [7] Detection of copy number variations with next generation sequencing: laboratory validation
    Klancar, G.
    Dragos, V. Setrajcic
    Novakovic, S.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 639 - 639
  • [8] Detecting copy number variations in routine diagnostic samples using next generation sequencing data
    Singh, Ashish Kumar
    Johansen, Jostein
    Ravi, Anuradha
    Misund, Kristine
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 639 - 639
  • [9] CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
    Zhang, Tong
    Dong, Jinxin
    Jiang, Hua
    Zhao, Zuyao
    Zhou, Mengjiao
    Yuan, Tianting
    Frontiers in Bioengineering and Biotechnology, 2022, 10
  • [10] CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
    Zhang, Tong
    Dong, Jinxin
    Jiang, Hua
    Zhao, Zuyao
    Zhou, Mengjiao
    Yuan, Tianting
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2022, 10