cTAP: A Machine Learning Framework for Predicting Target Genes of a Transcription Factor using a Cohort of Gene Expression Data Sets

被引:1
|
作者
Wang, Honglin [1 ]
Joshi, Pujan [1 ]
Hong, Seung-Hyun [1 ]
Maye, Peter F. [2 ]
Rowe, David W. [2 ]
Shin, Dong-Guk [1 ]
机构
[1] Univ Connecticut, Comp Sci & Engn Dept, Storrs, CT 06269 USA
[2] Univ Connecticut, Dept Reconstruct Sci, Hlth Ctr, Farmington, CT 06030 USA
关键词
TF target analysis; Cohort analysis; Osteoclast differentiation; Gene abundance analysis; Machine learning;
D O I
10.1109/BIBM49941.2020.9313303
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying target genes of a transcription factor is crucial in biomedical research. Thanks to ChIP-seq technology, scientists can estimate potential genome- wide target genes of a transcription factor. However, finding the consistently behaving Up/Down targets of a transcription factor in a given biological context is difficult because it requires analysis of a large number of studies under the same or comparable context. We present a transcription target prediction method, called Cohort-based TF target prediction system (cTAP). This method assumes that the pathway involving the transcription factor of interest is featured with multiple functional groups of marker genes pertaining to the concerned biological process. It uses the notion of gene-presence and gene-absence in addition to log2 ratios of gene expression values for the prediction. Target prediction is made by applying multiple machine-learning models that learn the patterns of gene-presence and gene-absence from log2 ratio and four types of Z scores from the normalized cohort's gene expression data. The learned patterns are then associated with the putative targets of the concerned transcription factor to elicit genes exhibiting Up/Down gene regulation patterns "consistently" within the cohort. Totally 11 publicly available GEO data sets related to osteoclastogenesis are used in our experiment. The learned models using gene-presence and gene-absence produce target genes different from using only log2 ratios such as CASP1, BID, and IRF5. Our literature survey reveals that all these predicted targets have known roles in bone remodeling, specifically related to immune and osteoclasts, suggesting confidence in our method and potential merit for a wet-lab experiment for validation.
引用
收藏
页码:164 / 167
页数:4
相关论文
共 50 条
  • [41] Clinical Implication of Machine Learning in Predicting the Occurrence of Cardiovascular Disease Using Big Data (Nationwide Cohort Data in Korea)
    Joo, Gihun
    Song, Yeongjin
    Im, Hyeonseung
    Park, Junbeom
    IEEE ACCESS, 2020, 8 : 157643 - 157653
  • [42] Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
    Elizabeth Held
    Joshua Cape
    Nathan Tintle
    BMC Proceedings, 10 (Suppl 7)
  • [43] Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data
    Zamanighomi, Mahdi
    Lin, Zhixiang
    Wang, Yong
    Jiang, Rui
    Wong, Wing Hung
    NUCLEIC ACIDS RESEARCH, 2017, 45 (10) : 5666 - 5677
  • [44] Predicting genotoxicity of viral vectors for stem cell gene therapy using gene expression-based machine learning
    Schwarzer, Adrian
    Talbot, Steven R.
    Selich, Anton
    Morgan, Michael
    Schott, Juliane W.
    Dittrich-Breiholz, Oliver
    Bastone, Antonella L.
    Weigel, Bettina
    Ha, Teng Cheong
    Dziadek, Violetta
    Gijsbers, Rik
    Thrasher, Adrian J.
    Staal, Frank J. T.
    Gaspar, Hubert B.
    Modlich, Ute
    Schambach, Axel
    Rothe, Michael
    MOLECULAR THERAPY, 2021, 29 (12) : 3383 - 3397
  • [45] Predicting Autism Spectrum Disorder Using Blood-based Gene Expression Signatures and Machine Learning
    Oh, Dong Hoon
    Kim, Il Bin
    Kim, Seok Hyeon
    Ahn, Dong Hyun
    CLINICAL PSYCHOPHARMACOLOGY AND NEUROSCIENCE, 2017, 15 (01) : 47 - 52
  • [46] Predicting solubility of nitrous oxide in ionic liquids using machine learning techniques and gene expression programming
    Amar, Menad Nait
    Ghriga, Mohammed Abdelfetah
    Ben Seghier, Mohamed El Amine
    Ouaer, Hocine
    JOURNAL OF THE TAIWAN INSTITUTE OF CHEMICAL ENGINEERS, 2021, 128 : 156 - 168
  • [47] Predicting autism spectrum disorder using blood-based gene expression signatures and machine learning
    Kim, S. H.
    Kim, I. B.
    Oh, D. H.
    Ahn, D. H.
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2017, 27 : S1090 - S1090
  • [48] Precision Medicine in Psychiatry by Predicting Lithium Treatment Response using Gene Expression Biomarkers and Machine Learning
    Eugene, Andy Roger
    Eugene, Beata
    Masiak, Jolanta
    FASEB JOURNAL, 2018, 32 (01):
  • [49] Machine Learning Algorithms for Predicting Chronic Obstructive Pulmonary Disease from Gene Expression Data with Class Imbalance
    Mahmudah, Kunti Robiatul
    Purnama, Bedy
    Indriani, Fatma
    Satou, Kenji
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2021, : 148 - 153
  • [50] Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning
    Riedling, Olivia
    Walker, Allison S.
    Rokas, Antonis
    MICROBIOLOGY SPECTRUM, 2024, 12 (02):