Variation benchmark datasets: update, criteria, quality and applications

被引:28
|
作者
Sarkar, Anasua [1 ]
Yang, Yang [2 ,3 ]
Vihinen, Mauno [1 ]
机构
[1] Lund Univ, Dept Expt Med Sci, BMC B13, SE-22184 Lund, Sweden
[2] Soochow Univ, Sch Comp Sci & Technol, 1 Shizi St, Suzhou 215006, Jiangsu, Peoples R China
[3] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, 1 Shizi St, Suzhou 215006, Jiangsu, Peoples R China
基金
瑞典研究理事会; 中国国家自然科学基金;
关键词
AMINO-ACID SUBSTITUTIONS; PREDICTING PROTEIN STABILITY; HUMAN-DISEASE GENES; COMPUTATIONAL TOOLS; MISSENSE VARIANTS; NUCLEOTIDE STRUCTURE; ACCURATE PREDICTION; MUTATION PATTERN; DATABASE; SEQUENCE;
D O I
10.1093/database/baz117
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu. se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] MOTIONBENCHMAKER: A Tool to Generate and Benchmark Motion Planning Datasets
    Chamzas, Constantinos
    Quintero-Pena, Carlos
    Kingston, Zachary
    Orthey, Andreas
    Rakita, Daniel
    Gleicher, Michael
    Toussaint, Marc
    Kavraki, Lydia E.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02): : 882 - 889
  • [42] Towards Generating Benchmark Datasets for Worm Infection Studies
    Asgari, Sara
    Sadeghiyan, Babak
    2020 10TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2020, : 1 - 8
  • [43] Quality in thyroid surgery: Evaluation Criteria and practical applications
    Peix, Jean-Louis
    Duclos, Antoine
    Lifante, Jean-Christophe
    BULLETIN DE L ACADEMIE NATIONALE DE MEDECINE, 2015, 199 (4-5): : 629 - 638
  • [44] Performance criteria to evaluate air quality modeling applications
    Thunis, P.
    Pederzoli, A.
    Pernigotti, D.
    ATMOSPHERIC ENVIRONMENT, 2012, 59 : 476 - 482
  • [45] DEVELOPMENT CRITERIA FOR A BENCHMARK TEST PROGRAM
    BOHM, K
    COMPUTER PROGRAMS IN BIOMEDICINE, 1982, 15 (03): : 243 - 248
  • [46] Setting a Benchmark for Quality of Care Update on Best Practices in Transcatheter Aortic Valve Replacement Programs
    Lauck, Sandra B.
    McCalmont, Gemma
    Smith, Amanda
    Kirk, Bettina Hojberg
    de Ronde-Tillmans, Marjo
    Wundram, Steffen
    Adhami, Nassim
    CRITICAL CARE NURSING CLINICS OF NORTH AMERICA, 2022, 34 (02) : 215 - 231
  • [47] BENCHMARK OF EROSION CRITERIA IN A DEEPWATER DEVELOPMENT
    Gomez-Alvarez, Susana
    Garcia Ruiz, Fernando
    Merino-Garcia, Daniel
    PROCEEDINGS OF THE ASME 37TH INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE AND ARCTIC ENGINEERING, 2018, VOL 9, 2018,
  • [48] Quality criteria and structural requirements for cardiac arrest centers-update 2024
    Rott, N.
    Boettiger, B. W.
    Busch, H. J.
    Frey, N.
    Kelm, M.
    Scholz, K. H.
    Thiele, H.
    NOTFALL & RETTUNGSMEDIZIN, 2024,
  • [49] Focus on Quality: Update on the Development of Evidence-Based Appropriate Use Criteria
    Pappas, Virginia
    JOURNAL OF NUCLEAR MEDICINE, 2016, 57 (05) : 15N - 15N
  • [50] Quality gaps in public pancreas imaging datasets: Implications & challenges for AI applications
    Suman, Garima
    Patra, Anurima
    Korfiatis, Panagiotis
    Majumder, Shounak
    Chari, Suresh T.
    Truty, Mark J.
    Fletcher, Joel G.
    Goenka, Ajit H.
    PANCREATOLOGY, 2021, 21 (05) : 1001 - 1008