Benchmark AFLOW Data Sets for Machine Learning

被引：0

作者：

Conrad L. Clement

Steven K. Kauwe

Taylor D. Sparks

机构：

[1] University of Utah,Department of Materials Science and Engineering

来源：

Integrating Materials and Manufacturing Innovation | 2020年 / 9卷

关键词：

AFLOW; Benchmark data sets; Machine learning; Materials informatics;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Materials informatics is increasingly finding ways to exploit machine learning algorithms. Techniques such as decision trees, ensemble methods, support vector machines, and a variety of neural network architectures are used to predict likely material characteristics and property values. Supplemented with laboratory synthesis, applications of machine learning to compound discovery and characterization represent one of the most promising research directions in materials informatics. A shortcoming of this trend, in its current form, is a lack of standardized materials data sets on which to train, validate, and test model effectiveness. Applied machine learning research depends on benchmark data to make sense of its results. Fixed, predetermined data sets allow for rigorous model assessment and comparison. Machine learning publications that do not refer to benchmarks are often hard to contextualize and reproduce. In this data descriptor article, we present a collection of data sets of different material properties taken from the AFLOW database. We describe them, the procedures that generated them, and their use as potential benchmarks. We provide a compressed ZIP file containing the data sets and a GitHub repository of associated Python code. Finally, we discuss opportunities for future work incorporating the data sets and creating similar benchmark collections.

引用

页码：153 / 156

页数：3

共 50 条

[1] Benchmark AFLOW Data Sets for Machine Learning
Clement, Conrad L.
Kauwe, Steven K.
Sparks, Taylor D.
[J]. INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2020, 9 (02) : 153 - 156
[2] Analysis of Data Sets With Learning Conflicts for Machine Learning
Ledesma, Sergio
Ibarra-Manzano, Mario-Alberto
Cabal-Yepez, Eduardo
Almanza-Ojeda, Dora-Luz
Avina-Cervantes, Juan-Gabriel
[J]. IEEE ACCESS, 2018, 6 : 45062 - 45070
[3] Analyzing EEG Data with Machine and Deep Learning: A Benchmark
Avola, Danilo
Cascio, Marco
Cinque, Luigi
Fagioli, Alessio
Foresti, Gian Luca
Marini, Marco Raoul
Pannone, Daniele
[J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 335 - 345
[4] Negative Data in Data Sets for Machine Learning Training
Maloney, Michael P.
Coley, Connor W.
Genheden, Samuel
Carson, Nessa
Helquist, Paul
Norrby, Per-Ola
Wiest, Olaf
[J]. ORGANIC LETTERS, 2023, 25 (17) : 2945 - 2947
[5] Negative Data in Data Sets for Machine Learning Training
Maloney, Michael P.
Coley, Connor W.
Genheden, Samuel
Carson, Nessa
Helquist, Paul
Norrby, Per-Ola
Wiest, Olaf
[J]. JOURNAL OF ORGANIC CHEMISTRY, 2023, 88 (09): : 5239 - 5241
[6] Characterization of machine learning benching data sets
Al-Mashouq, K
Nawaz, Z
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 3415 - 3419
[7] Fuzzy sets in machine learning and data mining
Huellermeier, Eyke
[J]. APPLIED SOFT COMPUTING, 2011, 11 (02) : 1493 - 1505
[8] Wireless Network Simulation to Create Machine Learning Benchmark Data
Katzef, Marc
Cullen, Andrew C.
Alpcan, Tansu
Leckie, Christopher
Kopacz, Justin
[J]. 2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 6378 - 6383
[9] MLFMF: Data Sets for Machine Learning for Mathematical Formalization
Bauer, Andrej
Petkovi, Matej
Todorovski, Ljupco
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Data Sets For Machine Learning In Wireless Communications And Networks
Fischione, Carlo
Chafii, Marwa
Deng, Yansha
Erol-Kantarci, Melike
[J]. IEEE COMMUNICATIONS MAGAZINE, 2023, 61 (09) : 80 - 81

← 1 2 3 4 5 →