Atlas of primary cell-type-specific sequence models of gene expression and variant effects
被引:4
|
作者:
论文数: 引用数:
h-index:
机构:
Sokolova, Ksenia
[1
,2
]
Theesfeld, Chandra L.
论文数: 0引用数: 0
h-index: 0
机构:
Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USAPrinceton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
Theesfeld, Chandra L.
[2
]
Wong, Aaron K.
论文数: 0引用数: 0
h-index: 0
机构:
Simons Fdn, Flatiron Inst, New York, NY 10001 USAPrinceton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
Wong, Aaron K.
[3
]
Zhang, Zijun
论文数: 0引用数: 0
h-index: 0
机构:
Simons Fdn, Flatiron Inst, New York, NY 10001 USA
Cedars Sinai Med Ctr, Div Artificial Intelligence Med, 116 N Robertson Blvd, Los Angeles, CA 90048 USAPrinceton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
Zhang, Zijun
[3
,4
]
Dolinski, Kara
论文数: 0引用数: 0
h-index: 0
机构:
Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USAPrinceton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
Dolinski, Kara
[2
]
Troyanskaya, Olga G.
论文数: 0引用数: 0
h-index: 0
机构:
Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
Simons Fdn, Flatiron Inst, New York, NY 10001 USAPrinceton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
Troyanskaya, Olga G.
[1
,2
,3
]
机构:
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[3] Simons Fdn, Flatiron Inst, New York, NY 10001 USA
[4] Cedars Sinai Med Ctr, Div Artificial Intelligence Med, 116 N Robertson Blvd, Los Angeles, CA 90048 USA
Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expres-sion directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human dis-eases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.