Population clustering based on copy number variations detected from next generation sequencing data

被引:0
|
作者
Duan, Junbo [1 ]
Zhang, Ji-Gang [2 ]
Wan, Mingxi [1 ]
Deng, Hong-Wen [2 ,3 ]
Wang, Yu-Ping [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Dept Biomed Engn, Xian 710049, Peoples R China
[2] Tulane Univ, Dept Biostat & Bioinformat, New Orleans, LA 70118 USA
[3] Tulane Univ, Dept Biomed Engn, New Orleans, LA 70118 USA
关键词
Next generation sequencing; copy number variations; non-negative matrix factorization; 1000 Genomes Project; STRUCTURAL VARIATION; GENOME; DIVERSITY; GENOTYPE; GENES; SEQ;
D O I
10.1142/S0219720014500218
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data
    Chu, Yanshuo
    Nie, Chenxi
    Wang, Yadong
    FRONTIERS IN GENETICS, 2020, 10
  • [22] RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data
    Liu, Guojun
    Zhang, Junying
    Yuan, Xiguo
    Wei, Chao
    FRONTIERS IN GENETICS, 2020, 11
  • [23] CNV_MCD: Detection of copy number variations based on minimum covariance determinant using next-generation sequencing data
    Li, Yaoyao
    Yang, Fangjia
    Xie, Kun
    DIGITAL SIGNAL PROCESSING, 2024, 154
  • [24] Confirmation and pathogenicity of small copy number variations incidentally detected via a targeted next-generation sequencing-based preimplantation genetic testing for
    Iturriaga, Amanda
    Mounts, Emily
    Picchetta, Ludovica
    Vega, Cara
    Mulas, Francesca
    Ottolini, Christian Simon
    Whitehead, Christine
    Tao, Xin
    Zhan, Yiping
    Loia, Nicole
    Jobanputra, Vaidehi
    Capalbo, Antonio
    Jalas, Chaim
    FERTILITY AND STERILITY, 2024, 122 (05) : 789 - 798
  • [25] A shortest path-based approach for copy number variation detection from next-generation sequencing data
    Liu, Guojun
    Yang, Hongzhi
    Yuan, Xiguo
    FRONTIERS IN GENETICS, 2023, 13
  • [26] Chromosomal copy number variations in products of conception from spontaneous abortion by next-generation sequencing technology
    Dai, Rulin
    Xi, Qi
    Wang, Ruixue
    Zhang, Hongguo
    Jiang, Yuting
    Li, Leilei
    Liu, Ruizhi
    MEDICINE, 2019, 98 (47)
  • [27] Detection of copy number variations by pair analysis using next-generation sequencing data in inherited kidney diseases
    China Nagano
    Kandai Nozu
    Naoya Morisada
    Masahiko Yazawa
    Daisuke Ichikawa
    Keita Numasawa
    Hiroyo Kourakata
    Chieko Matsumura
    Satoshi Tazoe
    Ryojiro Tanaka
    Tomohiko Yamamura
    Shogo Minamikawa
    Tomoko Horinouchi
    Keita Nakanishi
    Junya Fujimura
    Nana Sakakibara
    Yoshimi Nozu
    Ming Juan Ye
    Hiroshi Kaito
    Kazumoto Iijima
    Clinical and Experimental Nephrology, 2018, 22 : 881 - 888
  • [28] Next-generation sequencing identifies recurrent copy number variations in invasive breast carcinomas from Ghana
    Anwar, Talha
    Rufail, Miguel L.
    Djomehri, Sabra I.
    Gonzalez, Maria E.
    Lazo de la Vega, Lorena
    Tomlins, Scott A.
    Newman, Lisa A.
    Kleer, Celina G.
    MODERN PATHOLOGY, 2020, 33 (08) : 1537 - 1545
  • [29] Detection of copy number variations by pair analysis using next-generation sequencing data in inherited kidney diseases
    Nagano, China
    Nozu, Kandai
    Morisada, Naoya
    Yazawa, Masahiko
    Ichikawa, Daisuke
    Numasawa, Keita
    Kourakata, Hiroyo
    Matsumura, Chieko
    Tazoe, Satoshi
    Tanaka, Ryojiro
    Yamamura, Tomohiko
    Minamikawa, Shogo
    Horinouchi, Tomoko
    Nakanishi, Keita
    Fujimura, Junya
    Sakakibara, Nana
    Nozu, Yoshimi
    Ye, Ming Juan
    Kaito, Hiroshi
    Iijima, Kazumoto
    CLINICAL AND EXPERIMENTAL NEPHROLOGY, 2018, 22 (04) : 881 - 888
  • [30] A Sparse Model Based Detection of Copy Number Variations From Exome Sequencing Data
    Duan, Junbo
    Wan, Mingxi
    Deng, Hong-Wen
    Wang, Yu-Ping
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2016, 63 (03) : 496 - 505