Analysis of machine learning algorithms as integrative tools for validation of next generation sequencing data

被引:9
|
作者
Marceddu, G. [1 ]
Dallavilla, T. [2 ]
Guerri, G. [2 ]
Zulian, A. [2 ]
Marinelli, C. [2 ]
Bertelli, M. [1 ]
机构
[1] MAGI Euregio, Bolzano, Italy
[2] MAGIs LAB, Rovereto, TN, Italy
关键词
NGS; Validation; Diagnostics; Genome analysis; Bio-informatics; VARIANTS;
D O I
10.26355/eurrev_201909_19034
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
OBJECTIVE: While next generation sequencing (NGS) has become the technology of choice for clinical diagnostics, most genetic laboratories still use Sanger sequencing for orthogonal confirmation of NGS results. Previous studies have shown that when the quality of NGS data is high, most calls are indicated by Sanger sequencing, making confirmation redundant. We aimed at establishing a set of criteria that make it possible to distinguish NGS calls that need orthogonal confirmation from those that do not would significantly decrease the amount of work necessary to reach a diagnosis. MATERIALS AND METHODS: A data set of 7976 NGS calls confirmed as true or false positive by Sanger sequencing was used to train and test different machine learning (ML) approaches. By varying the size and class balance of the training dataset. we measured the performance of the different algorithms to determine the conditions under which ML is a valid approach for confirming NGS calls in a diagnostic environment. RESULTS: Our results indicate that machine learning is a valid approach to find variant calls that need more investigation, but in order to reach the high accuracy required in a clinical environment, the training data set must include enough observations and these observations must be well-balanced between true/false positive NGS calls. CONCLUSIONS: Our results show that it is possible to integrate the diagnostic NGS validation workflow with a machine learning approach to reduce the number of Sanger confirmations of high- quality NGS calls, reducing the time and costs of diagnosis.
引用
收藏
页码:8139 / 8147
页数:9
相关论文
共 50 条
  • [1] Assembly algorithms for next-generation sequencing data
    Miller, Jason R.
    Koren, Sergey
    Sutton, Granger
    GENOMICS, 2010, 95 (06) : 315 - 327
  • [2] Large Disclosing the Nature of Computational Tools for the Analysis of Next Generation Sequencing Data
    Cordero, Francesca
    Beccuti, Marco
    Donatelli, Susanna
    Calogero, Raffaele A.
    CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2012, 12 (12) : 1320 - 1330
  • [3] A survey of tools for variant analysis of next-generation genome sequencing data
    Pabinger, Stephan
    Dander, Andreas
    Fischer, Maria
    Snajder, Rene
    Sperk, Michael
    Efremova, Mirjana
    Krabichler, Birgit
    Speicher, Michael R.
    Zschocke, Johannes
    Trajanoski, Zlatko
    BRIEFINGS IN BIOINFORMATICS, 2014, 15 (02) : 256 - 278
  • [4] Cloud-Based Tools for Next-Generation Sequencing Data Analysis
    Baker, Qanita Bani
    Al-Rashdan, Wesam
    Jararweh, Yaser
    2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 99 - 105
  • [5] An integrative variant analysis suite for whole exome next-generation sequencing data
    Danny Challis
    Jin Yu
    Uday S Evani
    Andrew R Jackson
    Sameer Paithankar
    Cristian Coarfa
    Aleksandar Milosavljevic
    Richard A Gibbs
    Fuli Yu
    BMC Bioinformatics, 13
  • [6] An integrative variant analysis suite for whole exome next-generation sequencing data
    Challis, Danny
    Yu, Jin
    Evani, Uday S.
    Jackson, Andrew R.
    Paithankar, Sameer
    Coarfa, Cristian
    Milosavljevic, Aleksandar
    Gibbs, Richard A.
    Yu, Fuli
    BMC BIOINFORMATICS, 2012, 13
  • [7] GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data
    Ben-Ari Fuchs, Shani
    Lieder, Iris
    Stelzer, Gil
    Mazor, Yaron
    Buzhor, Ella
    Kaplan, Sergey
    Bogoch, Yoel
    Plaschkes, Inbar
    Shitrit, Alina
    Rappaport, Noa
    Kohn, Asher
    Edgar, Ron
    Shenhav, Liraz
    Safran, Marilyn
    Lancet, Doron
    Guan-Golan, Yaron
    Warshawsky, David
    Shtrichman, Ronit
    OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2016, 20 (03) : 139 - 151
  • [8] Machine Learning and Systems for Building the Next Generation of EDA tools
    Pandey, Manish
    2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 411 - 415
  • [9] ANGSD: Analysis of Next Generation Sequencing Data
    Thorfinn Sand Korneliussen
    Anders Albrechtsen
    Rasmus Nielsen
    BMC Bioinformatics, 15
  • [10] Next-Generation Sequencing Data Analysis
    Chowdhry, Amit K.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2024,