Estimating the Length Distributions of Genomic Micro-satellites from Next Generation Sequencing Data

被引:0
|
作者
Feng, Xuan [1 ,2 ]
Hu, Huan [1 ,2 ]
Zhao, Zhongmeng [1 ,2 ]
Zhang, Xuanping [1 ,2 ]
Wang, Jiayin [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Data Sci & Informat Qual, Shaanxi Engn Res Ctr Med & Hlth Big Data, Xian 710049, Shaanxi, Peoples R China
基金
美国国家科学基金会;
关键词
Genomic micro-satellite; Length distribution; Estimation approach; Next generation sequencing data; MICROSATELLITE INSTABILITY DETECTION;
D O I
10.1007/978-3-319-78723-7_40
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Genomic micro-satellites are the genomic regions that consist of short and repetitive DNA motifs. In contrast to unique genome, genomic micro-satellites expose high intrinsic polymorphisms, which mainly derive from variability in length. Length distributions are widely used to represent the polymorphisms. Recent studies report that some micro-satellites alter their length distributions significantly in tumor tissue samples comparing to the ones observed in normal samples, which becomes a hot topic in cancer genomics. Several state-of-the-art approaches are proposed to identify the length distributions from the sequencing data. However, the existing approaches can only handle the micro-satellites shorter than one read length, which limits the potential research on long micro-satellite events. In this article, we propose a probabilistic approach, implemented as ELMSI that estimates the length distributions of the micro-satellites longer than one read length. The core algorithm works on a set of mapped reads. It first clusters the reads, and a k-mer extension algorithm is adopted to detect the unit and breakpoints as well. Then, it conducts an expectation maximization algorithm to approach the true length distributions. According to the experiments, ELMSI is able to handle micro-satellites with the length spectrum from shorter than one read length to 10 kbps scale. A series of comparison experiments are applied, which vary the numbers of micro-satellite regions, read lengths and sequencing coverages, and ELMSI outperforms MSIsensor in most of the cases.
引用
收藏
页码:461 / 472
页数:12
相关论文
共 50 条
  • [1] Accurately estimating the length distributions of genomic micro-satellites by tumor purity deconvolution
    Yixuan Wang
    Xuanping Zhang
    Xiao Xiao
    Fei-Ran Zhang
    Xinxing Yan
    Xuan Feng
    Zhongmeng Zhao
    Yanfang Guan
    Jiayin Wang
    [J]. BMC Bioinformatics, 21
  • [2] Accurately estimating the length distributions of genomic micro-satellites by tumor purity deconvolution
    Wang, Yixuan
    Zhang, Xuanping
    Xiao, Xiao
    Zhang, Fei-Ran
    Yan, Xinxing
    Feng, Xuan
    Zhao, Zhongmeng
    Guan, Yanfang
    Wang, Jiayin
    [J]. BMC BIOINFORMATICS, 2020, 21 (Suppl 2)
  • [3] CMSI: A Bayesian model for estimating clonal micro-satellites instability from NGS data
    Wang, Yixuan
    Zhang, Xuanping
    Huang, Yi
    Liu, Tao
    Xiao, Xiao
    Wang, Jiayin
    [J]. CANCER RESEARCH, 2019, 79 (13)
  • [4] Unraveling genomic variation from next generation sequencing data
    Georgios A Pavlopoulos
    Anastasis Oulas
    Ernesto Iacucci
    Alejandro Sifrim
    Yves Moreau
    Reinhard Schneider
    Jan Aerts
    Ioannis Iliopoulos
    [J]. BioData Mining, 6
  • [5] Unraveling genomic variation from next generation sequencing data
    Pavlopoulos, Georgios A.
    Oulas, Anastasis
    Iacucci, Ernesto
    Sifrim, Alejandro
    Moreau, Yves
    Schneider, Reinhard
    Aerts, Jan
    Iliopoulos, Ioannis
    [J]. BIODATA MINING, 2013, 6
  • [6] Estimating Individual Admixture Proportions from Next Generation Sequencing Data
    Skotte, Line
    Korneliussen, Thorfinn Sand
    Albrechtsen, Anders
    [J]. GENETICS, 2013, 195 (03) : 693 - +
  • [7] Detection of genomic structural variants from next-generation sequencing data
    Tattini, Lorenzo
    D'Aurizio, Romina
    Magi, Alberto
    [J]. FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2015, 3
  • [8] Estimating Fitness of Viral Quasispecies from Next-Generation Sequencing Data
    Seifert, David
    Beerenwinkel, Niko
    [J]. QUASISPECIES: FROM THEORY TO EXPERIMENTAL SYSTEMS, 2016, 392 : 181 - 200
  • [9] GOLGE: A case study of a secure data communication subsystem for micro-satellites
    Yesil, S
    Sever, R
    Okcan, B
    Ismailoglu, N
    [J]. RAST 2005: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SPACE TECHNOLOGIES, 2005, : 438 - 441
  • [10] Correcting genomic deletion calls with complex boundaries from next generation sequencing data
    Zhao, Zhongmeng
    Tian, Zewen
    Geng, Yu
    He, Siyu
    Zhang, Xuanping
    Wang, Jiayin
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 1810 - 1817