Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection

被引:2
|
作者
Hazelaar, Daan M. [1 ]
van Riet, Job [1 ,2 ,5 ]
Hoogstrate, Youri [3 ]
van de Werken, Harmen J. G. [2 ,4 ]
机构
[1] Univ Med Ctr, Erasmus MC Canc Inst, Dept Med Oncol, NL-3015 GD Rotterdam, Netherlands
[2] Univ Med Ctr, Erasmus MC Canc Inst, Dept Urol, NL-3015 GD Rotterdam, Netherlands
[3] Univ Med Ctr, Erasmus MC Canc Inst, Dept Neurol, NL-3015 GD Rotterdam, Netherlands
[4] Univ Med Ctr, Erasmus MC Canc Inst, Dept Immunol, NL-3015 GD Rotterdam, Netherlands
[5] German Canc Res Ctr, Div Oncol, Neuenheimer Feld 280, D-69120 Heidelberg, Germany
来源
GIGASCIENCE | 2023年 / 12卷
关键词
kataegis; R-package; Bioconductor; changepoint analysis; cancer; MUTATIONAL PROCESSES; SIGNATURES;
D O I
10.1093/gigascience/giad081
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount. Findings: Here we present Katdetectr, an open-source R/Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters. We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets. Conclusions: Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr
引用
收藏
页数:11
相关论文
共 50 条
  • [1] changepoint: An R Package for Changepoint Analysis
    Killick, Rebecca
    Eckley, Idris A.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2014, 58 (03): : 1 - 19
  • [2] debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues
    Chen, Lulu
    Wu, Chiung-Ting
    Wang, Niya
    Herrington, David M.
    Clarke, Robert
    Wang, Yue
    [J]. BIOINFORMATICS, 2020, 36 (12) : 3927 - 3929
  • [3] GSAR: Bioconductor package for Gene Set analysis in R
    Yasir Rahmatallah
    Boris Zybailov
    Frank Emmert-Streib
    Galina Glazko
    [J]. BMC Bioinformatics, 18
  • [4] GSAR: Bioconductor package for Gene Set analysis in R
    Rahmatallah, Yasir
    Zybailov, Boris
    Emmert-Streib, Frank
    Glazko, Galina
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [5] UNDO: a Bioconductor R package for unsupervised deconvolution of mixed gene expressions in tumor samples
    Wang, Niya
    Gong, Ting
    Clarke, Robert
    Chen, Lulu
    Shih, Ie-Ming
    Zhang, Zhen
    Levine, Douglas A.
    Xuan, Jianhua
    Wang, Yue
    [J]. BIOINFORMATICS, 2015, 31 (01) : 137 - 139
  • [6] SpidermiR: An R/Bioconductor Package for Integrative Analysis with miRNA Data
    Cava, Claudia
    Colaprico, Antonio
    Bertoli, Gloria
    Graudenzi, Alex
    Silva, Tiago C.
    Olsen, Catharina
    Noushmehr, Houtan
    Bontempi, Gianluca
    Mauri, Giancarlo
    Castiglioni, Isabella
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2017, 18 (02)
  • [7] TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
    Colaprico, Antonio
    Silva, Tiago C.
    Olsen, Catharina
    Garofano, Luciano
    Cava, Claudia
    Garolini, Davide
    Sabedot, Thais S.
    Malta, Tathiane M.
    Pagnotta, Stefano M.
    Castiglioni, Isabella
    Ceccarelli, Michele
    Bontempi, Gianluca
    Noushmehr, Houtan
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (08) : e71
  • [8] CoSIA: an R Bioconductor package for CrOss Species Investigation and Analysis
    Haldar, Anisha
    Oza, Vishal H.
    DeVoss, Nathaniel S.
    Clark, Amanda D.
    Lasseigne, Brittany N.
    [J]. BIOINFORMATICS, 2023, 39 (12)
  • [9] ideal: an R/Bioconductor package for interactive differential expression analysis
    Marini, Federico
    Linke, Jan
    Binder, Harald
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [10] RTNsurvival: an R/Bioconductor package for regulatory network survival analysis
    Groeneveld, Clarice S.
    Chagas, Vinicius S.
    Jones, Steven J. M.
    Robertson, A. Gordon
    Ponder, Bruce A. J.
    Meyer, Kerstin B.
    Castro, Mauro A. A.
    [J]. BIOINFORMATICS, 2019, 35 (21) : 4488 - 4489