A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features

被引:0
|
作者
Rozenwald, Michal B. [1 ]
Galitsyna, Aleksandra A. [2 ]
Sapunov, Grigory V. [1 ,3 ]
Khrameeva, Ekaterina E. [2 ]
Gelfand, Mikhail S. [2 ,4 ]
机构
[1] Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
[2] Skolkovo Institute of Science and Technology, Moscow, Russia
[3] Intento, Inc., Berkeley,CA, United States
[4] A.A. Kharkevich Institute for Information Transmission Problems, Moscow,RAS, Russia
基金
俄罗斯科学基金会;
关键词
Chromatin immunoprecipitation - Gradient boosting - Histone modification - Linear regression models - On-machines - Recurrent neural network (RNN) - Relevant features - Technological advances;
D O I
10.7717/PEERJ-CS.307
中图分类号
学科分类号
摘要
Technological advances have lead to the creation of large epigenetic datasets, including information aboutDNAbinding proteins andDNAspatial structure. Hi-C experiments have revealed that chromosomes are subdivided into sets of self-interacting domains called Topologically Associating Domains (TADs). TADs are involved in the regulation of gene expression activity, but the mechanisms of their formation are not yet fully understood. Here, we focus on machine learning methods to characterize DNA folding patterns in Drosophila based on chromatin marks across three cell lines. We present linear regression models with four types of regularization, gradient boosting, and recurrent neural networks (RNN) as tools to study chromatin folding characteristics associated with TADs given epigenetic chromatin immunoprecipitation data. The bidirectional long short-term memory RNN architecture produced the best prediction scores and identified biologically relevant features. Distribution of protein Chriz (Chromator) and histone modification H3K4me3 were selected as the most informative features for the prediction of TADs characteristics. This approach may be adapted to any similar biological dataset of chromatin features across various cell lines and species. The code for the implemented pipeline, Hi-ChiP-ML, is publicly available: https://github.com/MichalRozenwald/Hi-ChIP-ML © 2020, Rozenwald et al. All Right Reserved.
引用
收藏
页码:2 / 21
相关论文
共 50 条
  • [1] A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features
    Rozenwald, Michal B.
    Galitsyna, Aleksandra A.
    Sapunov, Grigory, V
    Khrameeva, Ekaterina E.
    Gelfand, Mikhail S.
    [J]. PEERJ COMPUTER SCIENCE, 2020,
  • [2] Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
    Aromolaran, Olufemi
    Beder, Thomas
    Oswald, Marcus
    Oyelade, Jelili
    Adebiyi, Ezekiel
    Koenig, Rainer
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 612 - 621
  • [3] A Framework for Performing Prediction and Classification Using Machine Learning
    Pathak, Ajeet Ram
    Welling, Arpita
    Shelar, Gauri
    Vaze, Shravani
    Sankar, Shruti
    [J]. PROCEEDINGS OF ICETIT 2019: EMERGING TRENDS IN INFORMATION TECHNOLOGY, 2020, 605 : 893 - 906
  • [4] Prediction of chromatin spatial structure characteristics using machine learning methods
    Starikov, Sergei
    Khrameeva, Ekaterina
    Gelfand, Mikhail
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2489 - 2489
  • [5] Vibration-based brake health prediction using statistical features-A machine learning framework
    Pranesh, Hari
    Suresh, Khiran
    Manian, Swetha S.
    Jegadeeshwaran, R.
    Sakthivel, G.
    Manghi, T. M. Alamelu
    [J]. MATERIALS TODAY-PROCEEDINGS, 2021, 46 : 1167 - 1173
  • [6] A Study of Features Affecting on Stroke Prediction Using Machine Learning
    Songram, Panida
    Jareanpon, Chatklaw
    [J]. MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2019, 11909 : 216 - 225
  • [7] Prediction of Lung Function in Adolescence Using Epigenetic Aging: A Machine Learning Approach
    Arefeen, Md Adnan
    Nimi, Sumaiya Tabassum
    Rahman, M. Sohel
    Arshad, S. Hasan
    Holloway, John W.
    Rezwan, Faisal, I
    [J]. METHODS AND PROTOCOLS, 2020, 3 (04) : 1 - 9
  • [8] A Machine Learning Framework for Volume Prediction
    Onal, Umutcan
    Zafeirakopoulos, Zafeirakis
    [J]. ANALYSIS OF EXPERIMENTAL ALGORITHMS, SEA2 2019, 2019, 11544 : 408 - 423
  • [9] A Framework for Glaucoma Diagnosis Prediction Using Retinal Thickness Using Machine Learning
    Maram, Balajee
    Sahukari, Jitendra
    Lokesh, Tandra
    [J]. SMART TECHNOLOGIES FOR POWER AND GREEN ENERGY, STPGE 2022, 2023, 443 : 61 - 77
  • [10] Thyroid Disease Prediction Using Selective Features and Machine Learning Techniques
    Chaganti, Rajasekhar
    Rustam, Furqan
    De la Torre Diez, Isabel
    Vidal Mazon, Juan Luis
    Lili Rodriguez, Carmen
    Ashraf, Imran
    [J]. CANCERS, 2022, 14 (16)