Ranking genomic features using an information-theoretic measure of epigenetic discordance

被引:8
|
作者
Jenkinson, Garrett [1 ,2 ,3 ]
Abante, Jordi [1 ]
Koldobskiy, Michael A. [2 ,4 ]
Feinberg, Andrew P. [2 ,5 ,6 ]
Goutsias, John [1 ]
机构
[1] Johns Hopkins Univ, Whitaker Biomed Engn Inst, Baltimore, MD 21218 USA
[2] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[3] Mayo Clin, Dept Hlth Sci Res, Rochester, MN USA
[4] Johns Hopkins Univ, Sch Med, Pediat Oncol, Sidney Kimmel Comprehens Canc Ctr, Baltimore, MD USA
[5] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[6] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
关键词
DNA methylation; Genomic feature analysis; Information theory; Mutual Information; Gene ranking; Methylation analysis; WGBS data analysis; ALTER GENE-EXPRESSION; METHYLATION PROFILES; THERAPEUTIC TARGET; CELL PROLIFERATION; DNA; ASSOCIATION; PROGNOSIS; POWERFUL; H3K27ME3; GROWTH;
D O I
10.1186/s12859-019-2777-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. Results We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. Conclusions The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Ranking genomic features using an information-theoretic measure of epigenetic discordance
    Garrett Jenkinson
    Jordi Abante
    Michael A. Koldobskiy
    Andrew P. Feinberg
    John Goutsias
    [J]. BMC Bioinformatics, 20
  • [2] On testing uniformity using an information-theoretic measure
    Mahdizadeh, M.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (08) : 6173 - 6196
  • [3] AN INFORMATION-THEORETIC MEASURE OF TERM SPECIFICITY
    WONG, SKM
    YAO, YY
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1992, 43 (01): : 54 - 61
  • [4] Top-K Ranking: An Information-theoretic Perspective
    Chen, Yuxin
    Suh, Changho
    [J]. 2015 IEEE INFORMATION THEORY WORKSHOP - FALL (ITW), 2015, : 212 - 213
  • [5] Information-theoretic measure of genuine multiqubit entanglement
    Cai, Jian-Ming
    Zhou, Zheng-Wei
    Zhou, Xing-Xiang
    Guo, Guang-Can
    [J]. PHYSICAL REVIEW A, 2006, 74 (04):
  • [6] An Information-Theoretic Measure for Patterning in Epithelial Tissues
    Waites, William
    Cavaliere, Matteo
    Cachat, Elise
    Danos, Vincent
    Davies, Jamie A.
    [J]. IEEE ACCESS, 2018, 6 : 40302 - 40312
  • [7] Bladder Wall Motion Compensation Using a New Information-Theoretic Measure
    Lin, Q.
    Liang, Z.
    Ma, J.
    Li, H.
    Harrington, D.
    Waltzer, W.
    He, X.
    [J]. 2012 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE RECORD (NSS/MIC), 2012, : 3620 - 3623
  • [8] An Information-Theoretic Quantification of Discrimination with Exempt Features
    Dutta, Sanghamitra
    Venkatesh, Praveen
    Mardziel, Piotr
    Datta, Anupam
    Grover, Pulkit
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3825 - 3833
  • [9] Nonextensive information-theoretic measure for image edge detection
    Ben Hamza, A.
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2006, 15 (01)
  • [10] An Information-Theoretic Measure for the Computational Fidelity of Physical Processes
    Anderson, Neal G.
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 2356 - 2360