seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data

被引:1
|
作者
Liu, Sihan [1 ]
Zeng, Yuanyuan [2 ]
Wang, Chao [1 ]
Zhang, Qian [1 ]
Chen, Meilin [1 ]
Wang, Xiaolu [1 ]
Wang, Lanchen [1 ]
Lu, Yu [1 ]
Guo, Hui [3 ,4 ]
Bu, Fengxiao [1 ]
机构
[1] Sichuan Univ, Inst Rare Dis, West China Hosp, Chengdu, Peoples R China
[2] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Sch Med, Xiamen, Peoples R China
[3] Cent South Univ, Ctr Med Genet, Sch Life Sci, Changsha, Peoples R China
[4] Cent South Univ, Sch Life Sci, Hunan Prov Key Lab Med Genet, Changsha, Peoples R China
关键词
massively parallel sequencing data; Gaussian mixture model; gender; sex chromosomal abnormality; aneuploidy; GENE; FORMAT;
D O I
10.3389/fgene.2022.850804
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated > 99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Massively parallel sequencing as an investigative tool
    Ryan, Luke
    Mathieson, Megan
    Dwyer, Tegan
    Edwards, Marcus
    Harris, Libby
    Krosch, Matt
    Power, Daniel
    Brisotto, Paula
    Allen, Cathie
    Taylor, Ewen
    [J]. AUSTRALIAN JOURNAL OF FORENSIC SCIENCES, 2021, 53 (06) : 626 - 639
  • [2] New method for detecting mtDNA deletions from massively parallel sequencing data
    Suominen, T.
    Penttila, S.
    Jokela, M.
    Palmio, J.
    Udd, B.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 332 - 333
  • [3] Use of massively parallel sequencing as a diagnostic tool
    Stobbe, A. H.
    Daniels, J.
    Espindola, A.
    Schneider, W. L.
    Fletcher, J.
    Melcher, U. K.
    [J]. PHYTOPATHOLOGY, 2011, 101 (06) : S171 - S172
  • [4] SEQ Mapper: A DNA sequence searching tool for massively parallel sequencing data
    Lee, James Chun-I
    Tseng, Bill
    Chang, Liang-Kai
    Linacre, Adrian
    [J]. FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2017, 26 : 66 - 69
  • [5] SICTIN: Rapid footprinting of massively parallel sequencing data
    Enroth, Stefan
    Andersson, Robin
    Wadelius, Claes
    Komorowski, Jan
    [J]. BIODATA MINING, 2010, 3
  • [6] SICTIN: Rapid footprinting of massively parallel sequencing data
    Stefan Enroth
    Robin Andersson
    Claes Wadelius
    Jan Komorowski
    [J]. BioData Mining, 3
  • [7] Predicting the origin of stains from whole miRNome massively parallel sequencing data
    Dorum, Guro
    Ingold, Sabrina
    Hanson, Erin
    Ballantyne, Jack
    Russo, Giancarlo
    Aluri, Sirisha
    Snipen, Lars
    Haas, Cordula
    [J]. FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2019, 40 : 131 - 139
  • [8] Massively parallel sequencing: the new frontier of hematologic genomics
    Johnsen, Jill M.
    Nickerson, Deborah A.
    Reiner, Alex P.
    [J]. BLOOD, 2013, 122 (19) : 3268 - 3275
  • [9] DISSECTING CRIMES WITH DNA: MASSIVELY PARALLEL SEQUENCING REVEALS NEW DEPTHS OF DATA AND DETAILS
    [J]. May, Mike (mikemay1959@gmail.com), 1600, LabX Media Group (12):
  • [10] In silico simulation of massively parallel sequencing as a diagnostic tool for bacterial phytopathogens
    Daniels, J.
    Stobbe, T.
    Espindola, A.
    Schneider, W. L.
    Fletcher, J.
    Ochoa-Corona, F.
    [J]. PHYTOPATHOLOGY, 2011, 101 (06) : S41 - S41