Analysis of machine learning algorithms as integrative tools for validation of next generation sequencing data

被引:9
|
作者
Marceddu, G. [1 ]
Dallavilla, T. [2 ]
Guerri, G. [2 ]
Zulian, A. [2 ]
Marinelli, C. [2 ]
Bertelli, M. [1 ]
机构
[1] MAGI Euregio, Bolzano, Italy
[2] MAGIs LAB, Rovereto, TN, Italy
关键词
NGS; Validation; Diagnostics; Genome analysis; Bio-informatics; VARIANTS;
D O I
10.26355/eurrev_201909_19034
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
OBJECTIVE: While next generation sequencing (NGS) has become the technology of choice for clinical diagnostics, most genetic laboratories still use Sanger sequencing for orthogonal confirmation of NGS results. Previous studies have shown that when the quality of NGS data is high, most calls are indicated by Sanger sequencing, making confirmation redundant. We aimed at establishing a set of criteria that make it possible to distinguish NGS calls that need orthogonal confirmation from those that do not would significantly decrease the amount of work necessary to reach a diagnosis. MATERIALS AND METHODS: A data set of 7976 NGS calls confirmed as true or false positive by Sanger sequencing was used to train and test different machine learning (ML) approaches. By varying the size and class balance of the training dataset. we measured the performance of the different algorithms to determine the conditions under which ML is a valid approach for confirming NGS calls in a diagnostic environment. RESULTS: Our results indicate that machine learning is a valid approach to find variant calls that need more investigation, but in order to reach the high accuracy required in a clinical environment, the training data set must include enough observations and these observations must be well-balanced between true/false positive NGS calls. CONCLUSIONS: Our results show that it is possible to integrate the diagnostic NGS validation workflow with a machine learning approach to reduce the number of Sanger confirmations of high- quality NGS calls, reducing the time and costs of diagnosis.
引用
收藏
页码:8139 / 8147
页数:9
相关论文
共 50 条
  • [21] Applications and data analysis of next-generation sequencing
    Vogl, Ina
    Benet-Pages, Anna
    Eck, Sebastian H.
    Kuhn, Marius
    Vosberg, Sebastian
    Greif, Philipp A.
    Metzeler, Klaus H.
    Biskup, Saskia
    Mueller-Reible, Clemens
    Klein, Hanns-Georg
    LABORATORIUMSMEDIZIN-JOURNAL OF LABORATORY MEDICINE, 2013, 37 (06): : 305 - 315
  • [22] Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence
    Park, Youngjun
    Heider, Dominik
    Hauschild, Anne-Christin
    CANCERS, 2021, 13 (13)
  • [23] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [24] Predicting Cytomegalovirus Viral Load from Next-Generation Sequencing Data: A Machine Learning Approach
    Mallory, M.
    Hymas, W.
    Simmon, K.
    Henrie, K.
    Lloyd, J.
    Rawal, M.
    Hanson, K.
    Hillyard, D.
    Bradley, B.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2024, 26 (11): : S84 - S85
  • [25] Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next-Generation Sequencing Data
    Njage, Patrick Murigu Kamau
    Henri, Clementine
    Leekitcharoenphon, Pimlapas
    Mistou, Michel-Yves
    Hendriksen, Rene S.
    Hald, Tine
    RISK ANALYSIS, 2019, 39 (06) : 1397 - 1413
  • [26] Analysis of Image Thresholding Algorithms for Automated Machine Learning Training Data Generation
    Creek, Tristan
    Mullins, Barry E.
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON CYBER WARFARE AND SECURITY (ICCWS 2022), 2022, : 449 - 458
  • [27] An Empirical Evaluation of Error Correction Methods and Tools for Next Generation Sequencing Data
    Mehmood, Atif
    Ferzund, Javed
    Ali, Muhammad Usman
    Rehman, Abbas
    Ahmed, Shahzad
    Ahmad, Imran
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (01) : 425 - 431
  • [28] Next-Generation Sequencing Analysis and Algorithms for PDX and CDX Models
    Khandelwal, Garima
    Girotti, Maria Romina
    Smowton, Christopher
    Taylor, Sam
    Wirth, Christopher
    Dynowski, Marek
    Frese, Kristopher K.
    Brady, Ged
    Dive, Caroline
    Marais, Richard
    Miller, Crispin
    MOLECULAR CANCER RESEARCH, 2017, 15 (08) : 1012 - 1016
  • [29] Comparative analysis of algorithms for next-generation sequencing read alignment
    Ruffalo, Matthew
    LaFramboise, Thomas
    Koyutuerk, Mehmet
    BIOINFORMATICS, 2011, 27 (20) : 2790 - 2796
  • [30] An integrative framework for the identification of double minute chromosomes using next generation sequencing data
    Hayes, Matthew
    Li, Jing
    BMC GENETICS, 2015, 16