Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model

被引：3

作者：

Han, Seong Kyu ^{[1
,5
]}

Muto, Yoshiharu ^{[2
]}

Wilson, Parker C. ^{[3
]}

Humphreys, Benjamin D. ^{[2
,4
]}

Sampson, Matthew G. ^{[1
,5
]}

Chakravarti, Aravinda ^{[6
]}

Lee, Dongwo ^{[1
,7
]}

机构：

[1] Boston & Harvard Med Sch, Boston Childrens Hosp, Dept Pediat, Div Nephrol, Boston, MA 02115 USA

[2] Washington Univ St Louis, Dept Med, Div Nephrol, St Louis, MO 63130 USA

[3] Washington Univ St Louis, Dept Pathol & munol, St Louis, MO 63130 USA

[4] Washington Univ St Louis, Dept Dev Biol, St Louis, MO 63130 USA

[5] Broad Inst & Harvard, Kidney Dis Initiat, Cambridge, MA 02142 USA

[6] New York Univ, Ctr Human Genet & Genom, Grossman Sch Med, New York, NY 10016 USA

[7] Boston Childrens Hosp, Manton Ctr Orphan Res, Boston, MA 02115 USA

来源：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA | 2022年 / 119卷 / 51期

关键词：

quality control; chromatin accessibility; sequence-based model; gkmQC; GENOME-WIDE ASSOCIATION; BINDING PROTEINS; DNA; VISUALIZATION; ENHANCERS; VARIANTS; ENCODE; LMX1B; CHIP;

D O I：

10.1073/pnas.2212810119

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Chromatin accessibility assays are central to the genome-wide identification of gene regulatory elements associated with transcriptional regulation. However, the data have highly variable quality arising from several biological and technical factors. To surmount this problem, we developed a sequence-based machine learning method to evaluate and refine chromatin accessibility data. Our framework, gapped k-mer SVM quality check (gkmQC), provides the quality metrics for a sample based on the prediction accuracy of the trained models. We tested 886 DNase-seq samples from the ENCODE/Roadmap projects to demonstrate that gkmQC can effectively identify "high-quality" (HQ) sam-ples with low conventional quality scores owing to marginal read depths. Peaks identified in HQ samples are more accurately aligned at functional regulatory elements, show greater enrichment of regulatory elements harboring functional variants, and explain greater heritability of phenotypes from their relevant tissues. Moreover, gkmQC can optimize the peak-calling threshold to identify additional peaks, especially for rare cell types in single-cell chromatin accessibility data.

引用

页数：11

共 50 条

[31] A Sequence-Based Prediction Model of Vesicular Transport Proteins Using Ensemble Deep Learning
Le, Nguyen Quoc Khanh
Kha, Quang Hien
14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023, 2023,
[32] Using genomic databases for sequence-based biological discovery
Baxevanis, AD
MOLECULAR MEDICINE, 2003, 9 (9-12) : 185 - 192
[33] B-factor prediction in proteins using a sequence-based deep learning model
Pandey, Akash
Liu, Elaine
Graham, Jacob
Chen, Wei
Keten, Sinan
PATTERNS, 2023, 4 (09):
[34] Using Genomic Databases for Sequence-Based Biological Discovery
Andreas D Baxevanis
Molecular Medicine, 2003, 9 : 185 - 192
[35] Multilocus phylogeography and phylogenetics using sequence-based markers
Brito, Patricia H.
Edwards, Scott V.
GENETICA, 2009, 135 (03) : 439 - 455
[36] Multilocus phylogeography and phylogenetics using sequence-based markers
Patrícia H. Brito
Scott V. Edwards
Genetica, 2009, 135 : 439 - 455
[37] Discovery of therapeutic targets in cancer using chromatin accessibility and transcriptomic data
Forbes, Andre Neil
Xu, Duo
Cohen, Sandra
Pancholi, Priya
Khurana, Ekta
CELL SYSTEMS, 2024, 15 (09)
[38] Call for a Quality Standard for Sequence-Based Assays in Clinical Microbiology: Necessity for Quality Assessment of Sequences Used in Microbial Identification and Typing
Underwood, Anthony
Green, Jonathan
JOURNAL OF CLINICAL MICROBIOLOGY, 2011, 49 (01) : 23 - 26
[39] Chromatin Accessibility Data Sets Show Bias Due to Sequence Specificity of the DNase I Enzyme
Koohy, Hashem
Down, Thomas A.
Hubbard, Tim J.
PLOS ONE, 2013, 8 (07):
[40] A Novel Hybrid Sequence-Based Model for Identifying Anticancer Peptides
Xu, Lei
Liang, Guangmin
Wang, Longjie
Liao, Changrui
GENES, 2018, 9 (03)

← 1 2 3 4 5 →