Variation benchmark datasets: update, criteria, quality and applications

被引：28

作者：

Sarkar, Anasua ^{[1
]}

Yang, Yang ^{[2
,3
]}

Vihinen, Mauno ^{[1
]}

机构：

[1] Lund Univ, Dept Expt Med Sci, BMC B13, SE-22184 Lund, Sweden

[2] Soochow Univ, Sch Comp Sci & Technol, 1 Shizi St, Suzhou 215006, Jiangsu, Peoples R China

[3] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, 1 Shizi St, Suzhou 215006, Jiangsu, Peoples R China

来源：

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2020年

基金：

瑞典研究理事会; 中国国家自然科学基金;

关键词：

AMINO-ACID SUBSTITUTIONS; PREDICTING PROTEIN STABILITY; HUMAN-DISEASE GENES; COMPUTATIONAL TOOLS; MISSENSE VARIANTS; NUCLEOTIDE STRUCTURE; ACCURATE PREDICTION; MUTATION PATTERN; DATABASE; SEQUENCE;

D O I：

10.1093/database/baz117

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu. se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data.

引用

页数：16

共 50 条

[41] MOTIONBENCHMAKER: A Tool to Generate and Benchmark Motion Planning Datasets
Chamzas, Constantinos
Quintero-Pena, Carlos
Kingston, Zachary
Orthey, Andreas
Rakita, Daniel
Gleicher, Michael
Toussaint, Marc
Kavraki, Lydia E.
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02): : 882 - 889
[42] Towards Generating Benchmark Datasets for Worm Infection Studies
Asgari, Sara
Sadeghiyan, Babak
2020 10TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2020, : 1 - 8
[43] Quality in thyroid surgery: Evaluation Criteria and practical applications
Peix, Jean-Louis
Duclos, Antoine
Lifante, Jean-Christophe
BULLETIN DE L ACADEMIE NATIONALE DE MEDECINE, 2015, 199 (4-5): : 629 - 638
[44] Performance criteria to evaluate air quality modeling applications
Thunis, P.
Pederzoli, A.
Pernigotti, D.
ATMOSPHERIC ENVIRONMENT, 2012, 59 : 476 - 482
[45] DEVELOPMENT CRITERIA FOR A BENCHMARK TEST PROGRAM
BOHM, K
COMPUTER PROGRAMS IN BIOMEDICINE, 1982, 15 (03): : 243 - 248
[46] Setting a Benchmark for Quality of Care Update on Best Practices in Transcatheter Aortic Valve Replacement Programs
Lauck, Sandra B.
McCalmont, Gemma
Smith, Amanda
Kirk, Bettina Hojberg
de Ronde-Tillmans, Marjo
Wundram, Steffen
Adhami, Nassim
CRITICAL CARE NURSING CLINICS OF NORTH AMERICA, 2022, 34 (02) : 215 - 231
[47] BENCHMARK OF EROSION CRITERIA IN A DEEPWATER DEVELOPMENT
Gomez-Alvarez, Susana
Garcia Ruiz, Fernando
Merino-Garcia, Daniel
PROCEEDINGS OF THE ASME 37TH INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE AND ARCTIC ENGINEERING, 2018, VOL 9, 2018,
[48] Quality criteria and structural requirements for cardiac arrest centers-update 2024
Rott, N.
Boettiger, B. W.
Busch, H. J.
Frey, N.
Kelm, M.
Scholz, K. H.
Thiele, H.
NOTFALL & RETTUNGSMEDIZIN, 2024,
[49] Focus on Quality: Update on the Development of Evidence-Based Appropriate Use Criteria
Pappas, Virginia
JOURNAL OF NUCLEAR MEDICINE, 2016, 57 (05) : 15N - 15N
[50] Quality gaps in public pancreas imaging datasets: Implications & challenges for AI applications
Suman, Garima
Patra, Anurima
Korfiatis, Panagiotis
Majumder, Shounak
Chari, Suresh T.
Truty, Mark J.
Fletcher, Joel G.
Goenka, Ajit H.
PANCREATOLOGY, 2021, 21 (05) : 1001 - 1008

← 1 2 3 4 5 →