Population clustering based on copy number variations detected from next generation sequencing data

被引:0
|
作者
Duan, Junbo [1 ]
Zhang, Ji-Gang [2 ]
Wan, Mingxi [1 ]
Deng, Hong-Wen [2 ,3 ]
Wang, Yu-Ping [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Dept Biomed Engn, Xian 710049, Peoples R China
[2] Tulane Univ, Dept Biostat & Bioinformat, New Orleans, LA 70118 USA
[3] Tulane Univ, Dept Biomed Engn, New Orleans, LA 70118 USA
关键词
Next generation sequencing; copy number variations; non-negative matrix factorization; 1000 Genomes Project; STRUCTURAL VARIATION; GENOME; DIVERSITY; GENOTYPE; GENES; SEQ;
D O I
10.1142/S0219720014500218
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] USE OF NEXT-GENERATION SEQUENCING TO DETECT COPY NUMBER VARIATIONS IN THE MOLECULAR DIAGNOSIS OF FAMILIAL HYPERCHOLESTEROLEMIA
    Iacocca, Michael
    Wang, Jian
    Dron, Jacqueline
    Robinson, John
    Mcintyre, Adam
    Cao, Henian
    Hegele, Robert
    ATHEROSCLEROSIS, 2017, 263 : E236 - E236
  • [42] Development of a comprehensive and highly sensitive next-generation sequencing assay for detection of copy number variations
    Hsieh, Chia-Ling
    Zlatkov, Clare
    Luo, Byron
    Zhao, Chen
    Stephens, Kathryn
    Chuang, Han-Yu
    Kelly, Lisa
    Chang, Katherine
    Liang, Rachel
    Cao, Jianli
    Lang, Scott
    Adams, Ashley
    Ajili, Naseem
    Ball, Laurel
    Caves, Glorianna
    Chou, Danny
    Clark, Katie
    Crain, Brian
    Daulo, Anthony
    Dumm, Sarah
    Ekram, Ridwana
    Han, Yonmee
    Jager, Anne
    Johansen, Suzanne
    Teng, Li
    Lococo, Jenn
    McLean, Jaime
    Parks, Juli
    Rostron, Jason
    Sayne, Jennifer
    Silhavy, Jennifer
    Snedecor, June
    Toh, Mckenzi
    Tong, Stephanie
    Upsall, Elizabeth
    Walichiewicz, Pauline
    Chen, Xiao
    Young, Amanda
    Kuraishy, Ali
    Gutekunst, Karen
    Friedenberg, Matt
    Lin, Charles
    CANCER RESEARCH, 2016, 76
  • [43] An Assessment of Copy Number Variations and Somatic Mutations in Advanced Melanomas by Clinical Next-Generation Sequencing
    Siroy, Alan E.
    Sui, Dawen
    Ning, Jing
    Luthra, Rajyalakshmi
    Patel, Keyur P.
    Routbort, Mark J.
    Broaddus, Russell R.
    Nagarajan, Priyadharsini
    Aung, Phyu P.
    Ivan, Doina
    Curry, Jonathan L.
    Torres-Cabala, Carlos A.
    Prieto, Victor G.
    Davies, Michael A.
    Lazar, Alexander J.
    Tetzlaff, Michael T.
    MODERN PATHOLOGY, 2017, 30 : 136A - 136A
  • [44] DETECTION OF COPY NUMBER VARIATIONS (CNVS) IN LDLR GENE BY NEXT GENERATION SEQUENCING IN PATIENTS WITH FAMILIAL HYPERCHOLESTEROLEMIA
    Scrimali, C.
    Spina, R.
    Ingrassia, V.
    Cefalu, A. B.
    Valenti, V.
    Altieri, G. I.
    Noto, D.
    Brucato, F.
    Misiano, G.
    Giammanco, A.
    Barbagallo, C. M.
    Ganci, A.
    Fayer, F.
    Averna, M. R.
    ATHEROSCLEROSIS, 2018, 275 : E154 - E154
  • [45] MUTATIONAL PROFILE AND COPY NUMBER VARIATIONS IN MYELODYSPLASTIC SYNDROMES BY HIGH-DEPTH NEXT GENERATION SEQUENCING
    Cedena, T.
    Rapado, I.
    Ayala, R.
    Martinez-Lopez, J.
    LEUKEMIA RESEARCH, 2015, 39 : S73 - S74
  • [46] Alignment-free method for gene copy number estimation from raw next generation sequencing data
    Pajuste, Fanny-Dhelia
    Remm, Maido
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 604 - 604
  • [47] SMN1copy-number and sequence variant analysis from next-generation sequencing data
    Lopez-Lopez, Daniel
    Loucera, Carlos
    Carmona, Rosario
    Aquino, Virginia
    Salgado, Josefa
    Pasalodos, Sara
    Miranda, Maria
    Alonso, Angel
    Dopazo, Joaquin
    HUMAN MUTATION, 2020, 41 (12) : 2073 - 2077
  • [48] CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data
    Sinha, Rituparna
    Samaddar, Sandip
    De, Rajat K.
    PLOS ONE, 2015, 10 (08):
  • [49] Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data
    Brendan O’Fallon
    Jacob Durtschi
    Ana Kellogg
    Tracey Lewis
    Devin Close
    Hunter Best
    BMC Bioinformatics, 23
  • [50] A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
    Hill, Tom
    Unckless, Robert L.
    G3-GENES GENOMES GENETICS, 2019, 9 (11): : 3575 - 3582