Identification of sample annotation errors in gene expression datasets

被引:0
|
作者
Miriam Lohr
Birte Hellwig
Karolina Edlund
Johanna S. M. Mattsson
Johan Botling
Marcus Schmidt
Jan G. Hengstler
Patrick Micke
Jörg Rahnenführer
机构
[1] TU Dortmund University,Department of Statistics
[2] Leibniz Research Centre for Working Environment and Human Factors (IfADo) at Dortmund TU,Department of Immunology, Genetics and Pathology
[3] Uppsala University,Department of Obstetrics and Gynecology
[4] University Hospital,undefined
来源
Archives of Toxicology | 2015年 / 89卷
关键词
Gene expression; Microarray; Misannotation; Quality control; Male–female classifier;
D O I
暂无
中图分类号
学科分类号
摘要
The comprehensive transcriptomic analysis of clinically annotated human tissue has found widespread use in oncology, cell biology, immunology, and toxicology. In cancer research, microarray-based gene expression profiling has successfully been applied to subclassify disease entities, predict therapy response, and identify cellular mechanisms. Public accessibility of raw data, together with corresponding information on clinicopathological parameters, offers the opportunity to reuse previously analyzed data and to gain statistical power by combining multiple datasets. However, results and conclusions obviously depend on the reliability of the available information. Here, we propose gene expression-based methods for identifying sample misannotations in public transcriptomic datasets. Sample mix-up can be detected by a classifier that differentiates between samples from male and female patients. Correlation analysis identifies multiple measurements of material from the same sample. The analysis of 45 datasets (including 4913 patients) revealed that erroneous sample annotation, affecting 40 % of the analyzed datasets, may be a more widespread phenomenon than previously thought. Removal of erroneously labelled samples may influence the results of the statistical evaluation in some datasets. Our methods may help to identify individual datasets that contain numerous discrepancies and could be routinely included into the statistical analysis of clinical gene expression data.
引用
收藏
页码:2265 / 2272
页数:7
相关论文
共 50 条
  • [11] Identification of Potential Biomarkers for Psoriasis by DNA Methylation and Gene Expression Datasets
    Liu, Yong
    Cui, Shengnan
    Sun, Jiayi
    Yan, Xiaoning
    Han, Dongran
    FRONTIERS IN GENETICS, 2021, 12
  • [12] Extended analysis validates sample mix-up problem in gene expression datasets
    Bolt, Hermann M.
    ARCHIVES OF TOXICOLOGY, 2016, 90 (11) : 2825 - 2826
  • [13] On the relation between the true and sample correlations under Bayesian modelling of gene expression datasets
    Jacobovic, Royi
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2018, 17 (04)
  • [14] plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters
    Kautsar, Satria A.
    Duran, Hernando G. Suarez
    Blin, Kai
    Osbourn, Anne
    Medema, Marnix H.
    NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) : W55 - W63
  • [15] Identification of Breast Cancer Subtypes Using Multiple Gene Expression Microarray Datasets
    Mendes, Alexandre
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 92 - 101
  • [16] Identification of IPF endotypes using publicly available gene-expression datasets
    Moss, B. J.
    Gandhi, T.
    Robertson, M. J.
    Poli, F.
    De Frias, S. Poli
    Celada, L. J.
    Tsoyi, K.
    Vasquez, F. Romero
    Ryter, S. W.
    Rosas, I. O.
    Coarfa, C.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2021, 203 (09)
  • [17] Identification of key regulators in prostate cancer from gene expression datasets of patients
    Mangangcha, Irengbam Rocky
    Malik, Md. Zubbair
    Kucuk, Omer
    Ali, Shakir
    Singh, R. K. Brojen
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [18] Identification of key regulators in prostate cancer from gene expression datasets of patients
    Irengbam Rocky Mangangcha
    Md. Zubbair Malik
    Ömer Küçük
    Shakir Ali
    R. K. Brojen Singh
    Scientific Reports, 9
  • [19] Protozoan genomes: gene identification and annotation
    Worthey, EA
    Myler, PJ
    INTERNATIONAL JOURNAL FOR PARASITOLOGY, 2005, 35 (05) : 495 - 512
  • [20] Consistent annotation of gene expression arrays
    Benoît Ballester
    Nathan Johnson
    Glenn Proctor
    Paul Flicek
    BMC Genomics, 11