Machine learning methods for transcription data integration

被引:7
|
作者
Holloway, D. T. [1 ]
Kon, M. A.
DeLisi, C.
机构
[1] Boston Univ, Dept Mol Biol Cell Biol & Biochem, Boston, MA 02215 USA
[2] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[3] Boston Univ, Dept Bioinformat & Syst Biol, Boston, MA 02215 USA
关键词
D O I
10.1147/rd.506.0631
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Gene expression is modulated by transcription factors (TFs), which are proteins that generally hind to DNA adjacent to coding regions and initiate transcription. Each target gene can be regulated by more than one TF, and each TF can regulate many targets. For a complete molecular understanding of transcriptional regulation, researchers must first associate each TF with the set of genes that it regulates. Here we present a summary of completed work on the ability to associate 104 TFs with their binding sites using support vector machines (SVMs), which are classfication algorithms based in statistical learning theory. We use several types of genomic datasets to train classifiers in order to predict TF binding in the yeast genome. We consider motif matches, subsequence counts, motif conservation, functional annotation, and expression profiles. A simple weighting scheme varies the contribution of each type of genomic data when building a final SVM classifier, which we evaluate using known binding sites published in the literature and in online databases. The SVM algorithm works best when all datasets are combined, producing 73% coverage of known interactions, with a prediction accuracy of almost 0.9. We discuss new ideas and preliminary work for improving SVM classification of biological data.
引用
收藏
页码:631 / 643
页数:13
相关论文
共 50 条
  • [1] Machine learning methods for transcription data integration
    Holloway, Dustin T.
    Kon, Mark A.
    DeLisi, Charles
    [J]. IBM Journal of Research and Development, 2006, 50 (06): : 631 - 643
  • [2] Data Integration in Machine Learning
    Li, Yifeng
    Ngom, Alioune
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1665 - 1671
  • [3] Caching and Machine Learning Integration Methods on Named Data Network: a Survey
    Negara, Ridha Muldina
    Syambas, Nana Rachmana
    [J]. PROCEEDING OF 14TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATION SYSTEMS, SERVICES, AND APPLICATIONS (TSSA), 2020,
  • [4] Data Integration using Machine Learning
    Birgersson, Marcus
    Hansson, Gustav
    Franke, Ulrik
    [J]. 2016 IEEE 20TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING WORKSHOP (EDOCW), 2016, : 313 - 322
  • [5] Machine Learning for Medical Data Integration
    Mueller, Armin
    Christmann, Lara-Sophie
    Kohler, Severin
    Eils, Roland
    Prasser, Fabian
    [J]. CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 691 - 695
  • [6] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 2094 - 2097
  • [7] Interactive Machine Learning for Laboratory Data Integration
    Fillmore, Nathanael
    Do, Nhan
    Brophy, Mary
    Zimolzak, Andrew
    [J]. MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 133 - 137
  • [8] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1645 - 1650
  • [9] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 3193 - 3194
  • [10] Machine Learning Methods for BIM Data
    Slusarczyk, Grazyna
    Strug, Barbara
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT II, 2022, 13758 : 230 - 240