Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis

被引:2
|
作者
Xia, Kelin [1 ,2 ]
机构
[1] Nanyang Technol Univ, Sch Phys & Math Sci, Div Math Sci, Singapore 637371, Singapore
[2] Nanyang Technol Univ, Sch Biol Sci, Singapore 637371, Singapore
来源
PLOS ONE | 2018年 / 13卷 / 02期
关键词
TOPOLOGICAL DOMAINS; FUNCTIONAL-ORGANIZATION; GENOME; PRINCIPLES; IDENTIFICATION; SHAPE;
D O I
10.1371/journal.pone.0191899
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, we introduce sequence-based multiscale modeling for biomolecular data analysis. We employ spectral clustering method in our modeling and reveal the difference between sequence-based global scale clustering and local scale clustering. Essentially, two types of distances, i.e., Euclidean (or spatial) distance and genomic (or sequential) distance, can be used in data clustering. Clusters from sequence-based global scale models optimize spatial distances, meaning spatially adjacent loci are more likely to be assigned into the same cluster. Sequence-based local scale models, on the other hand, result in clusters that optimize genomic distances. That is to say, in these models, sequentially adjoining loci tend to be cluster together. We propose two sequence-based multiscale models (SeqMMs) for the study of chromosome hierarchical structures, including genomic compartments and topological associated domains (TADs). We find that genomic compartments are determined only by global scale information in the Hi-C data. The removal of all the local interactions within a band region as large as 10 Mb in genomic distance has almost no significant influence on the final compartment results. Further, in TAD analysis, we find that when the sequential scale is small, a tiny variation of diagonal band region in a contact map will result in a great change in the predicted TAD boundaries. When the scale value is larger than a threshold value, the TAD boundaries become very consistent. This threshold value is highly related to TAD sizes. By the comparison of our results with those previously obtained using a spectral clustering model, we find that our method is more robust and reliable. Finally, we demonstrate that almost all TAD boundaries from both clustering methods are local minimum of a TAD summation function.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] An upgraded method of high-throughput chromosome conformation capture (Hi-C 3.0) in cotton (Gossypium spp.)
    Han, Jin
    Wang, Siyuan
    Wu, Hongyu
    Zhao, Ting
    Guan, Xueying
    Fang, Lei
    [J]. FRONTIERS IN PLANT SCIENCE, 2023, 14
  • [2] The DLO Hi-C Tool for Digestion-Ligation-Only Hi-C Chromosome Conformation Capture Data Analysis
    Hong, Ping
    Jiang, Hao
    Xu, Weize
    Lin, Da
    Xu, Qian
    Cao, Gang
    Li, Guoliang
    [J]. GENES, 2020, 11 (03)
  • [3] Metagenome Analysis Exploiting High-Throughput Chromosome Conformation Capture (3C) Data
    Marbouty, Martial
    Koszul, Romain
    [J]. TRENDS IN GENETICS, 2015, 31 (12) : 673 - 682
  • [4] covNorm: An R package for coverage based normalization of Hi-C and capture Hi-C data
    Kim, Kyukwang
    Jung, Inkyung
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3149 - 3159
  • [5] Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data
    Carver, Tim
    Harris, Simon R.
    Berriman, Matthew
    Parkhill, Julian
    McQuillan, Jacqueline A.
    [J]. BIOINFORMATICS, 2012, 28 (04) : 464 - 469
  • [6] Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation
    Belaghzal, Houda
    Dekker, Job
    Gibcus, Johan H.
    [J]. METHODS, 2017, 123 : 56 - 65
  • [7] Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform
    Sandoval-Velasco, Marcela
    Antonio Rodriguez, Juan
    Estrada, Cynthia Perez
    Zhang, Guojie
    Aiden, Erez Lieberman
    Marti-Renom, Marc A.
    Gilbert, M. Thomas P.
    Smith, Oliver
    [J]. GIGASCIENCE, 2020, 9 (08):
  • [8] Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization
    Wolff, Joachim
    Rabbani, Leily
    Gilsbach, Ralf
    Richard, Gautier
    Manke, Thomas
    Backofen, Rolf
    Gruening, Bjoern A.
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (W1) : W177 - W184
  • [9] High-throughput sequence-based epigenomic analysis of Alu repeats in human cerebellum
    Xie, Hehuang
    Wang, Min
    Bonaldo, Maria de F.
    Smith, Christina
    Rajaram, Veena
    Goldman, Stewart
    Tomita, Tadanori
    Soares, Marcelo B.
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 (13) : 4331 - 4340
  • [10] High-throughput chromosome conformation capture-based analysis of higher-order chromatin structure in nasopharyngeal carcinoma
    Yang, Yuanyuan
    Chen, Mingfa
    Cheng, Lingjun
    Su, Canping
    Liao, Xiyi
    He, Hongzhang
    You, Mingming
    Rui, Gang
    Hong, Guolin
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2021, 9 (16)