cTAP: A Machine Learning Framework for Predicting Target Genes of a Transcription Factor using a Cohort of Gene Expression Data Sets

被引:1
|
作者
Wang, Honglin [1 ]
Joshi, Pujan [1 ]
Hong, Seung-Hyun [1 ]
Maye, Peter F. [2 ]
Rowe, David W. [2 ]
Shin, Dong-Guk [1 ]
机构
[1] Univ Connecticut, Comp Sci & Engn Dept, Storrs, CT 06269 USA
[2] Univ Connecticut, Dept Reconstruct Sci, Hlth Ctr, Farmington, CT 06030 USA
关键词
TF target analysis; Cohort analysis; Osteoclast differentiation; Gene abundance analysis; Machine learning;
D O I
10.1109/BIBM49941.2020.9313303
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying target genes of a transcription factor is crucial in biomedical research. Thanks to ChIP-seq technology, scientists can estimate potential genome- wide target genes of a transcription factor. However, finding the consistently behaving Up/Down targets of a transcription factor in a given biological context is difficult because it requires analysis of a large number of studies under the same or comparable context. We present a transcription target prediction method, called Cohort-based TF target prediction system (cTAP). This method assumes that the pathway involving the transcription factor of interest is featured with multiple functional groups of marker genes pertaining to the concerned biological process. It uses the notion of gene-presence and gene-absence in addition to log2 ratios of gene expression values for the prediction. Target prediction is made by applying multiple machine-learning models that learn the patterns of gene-presence and gene-absence from log2 ratio and four types of Z scores from the normalized cohort's gene expression data. The learned patterns are then associated with the putative targets of the concerned transcription factor to elicit genes exhibiting Up/Down gene regulation patterns "consistently" within the cohort. Totally 11 publicly available GEO data sets related to osteoclastogenesis are used in our experiment. The learned models using gene-presence and gene-absence produce target genes different from using only log2 ratios such as CASP1, BID, and IRF5. Our literature survey reveals that all these predicted targets have known roles in bone remodeling, specifically related to immune and osteoclasts, suggesting confidence in our method and potential merit for a wet-lab experiment for validation.
引用
收藏
页码:164 / 167
页数:4
相关论文
共 50 条
  • [21] Machine learning for predicting overall survival using whole exome DNA and gene expression data and analyzing the significance of features
    Chebanov, Dmitrii K.
    Tatevosova, Nadezhda S.
    Mikhaylova, Irina N.
    CLINICAL CANCER RESEARCH, 2021, 27 (05)
  • [22] Predicting Bone Metastasis Using Gene Expression-Based Machine Learning Models
    Albaradei, Somayah
    Uludag, Mahmut
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    FRONTIERS IN GENETICS, 2021, 12
  • [23] Predicting growth and mortality of bivalve larvae using gene expression and supervised machine learning
    Bassim, Sleiman
    Chapman, Robert W.
    Tanguy, Arnaud
    Moraga, Dario
    Tremblay, Rejean
    COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS, 2015, 16 : 59 - 72
  • [24] Identification of MicroRNA-Target Gene-Transcription Factor Regulatory Networks in Colorectal Adenoma Using Microarray Expression Data
    Gao, Yadong
    Zhang, Shenglai
    Zhang, Yan
    Qian, Junbo
    FRONTIERS IN GENETICS, 2020, 11
  • [25] Differential sensitivities of transcription factor target genes underlie cell type-specific gene expression profiles
    Johnson, Kirby D.
    Kim, Shin-Il
    Boyer, Megan E.
    Bresnick, Emery H.
    BLOOD CELLS MOLECULES AND DISEASES, 2007, 38 (02) : 147 - 148
  • [26] Differential sensitivities of transcription factor target genes underlie cell type-specific gene expression profiles
    Johnson, Kirby D.
    Kim, Shin-Il
    Bresnick, Emery H.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (43) : 15939 - 15944
  • [27] BAYESIAN JOINT MODELING OF MULTIPLE GENE NETWORKS AND DIVERSE GENOMIC DATA TO IDENTIFY TARGET GENES OF A TRANSCRIPTION FACTOR
    Wei, Peng
    Pan, Wei
    ANNALS OF APPLIED STATISTICS, 2012, 6 (01): : 334 - 355
  • [28] Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data
    Essaghir, Ahmed
    Toffalini, Federica
    Knoops, Laurent
    Kallin, Anders
    van Helden, Jacques
    Demoulin, Jean-Baptiste
    NUCLEIC ACIDS RESEARCH, 2010, 38 (11) : e120 - e120
  • [29] Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets
    Krepel, Jessica
    Kircher, Magdalena
    Kohls, Moritz
    Jung, Klaus
    STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (01) : 112 - 124
  • [30] Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
    Alharbi, Fadi
    Vakanski, Aleksandar
    BIOENGINEERING-BASEL, 2023, 10 (02):