Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases

被引:97
|
作者
Vallania, Francesco [1 ,2 ]
Tam, Andrew [1 ,3 ]
Lofgren, Shane [1 ,2 ]
Schaffert, Steven [1 ,2 ]
Azad, Tej D. [1 ]
Bongen, Erika [1 ]
Haynes, Winston [2 ]
Alsup, Meia [1 ,3 ]
Alonso, Michael [4 ]
Davis, Mark [1 ]
Engleman, Edgar [4 ]
Khatri, Purvesh [1 ,2 ]
机构
[1] Stanford Univ, Inst Immun Transplantat & Infect, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Med, Stanford Ctr Biomed Informat Res, Stanford, CA 94305 USA
[3] Stanford Univ, Stanford Inst Med Summer Res Program, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Pathol, Stanford, CA 94305 USA
来源
NATURE COMMUNICATIONS | 2018年 / 9卷
关键词
GENE-EXPRESSION; CANCER;
D O I
10.1038/s41467-018-07242-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.
引用
收藏
页数:8
相关论文
共 1 条
  • [1] Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
    Francesco Vallania
    Andrew Tam
    Shane Lofgren
    Steven Schaffert
    Tej D. Azad
    Erika Bongen
    Winston Haynes
    Meia Alsup
    Michael Alonso
    Mark Davis
    Edgar Engleman
    Purvesh Khatri
    Nature Communications, 9