A benchmark dataset for machine learning in ecotoxicology

被引:0
|
作者
Christoph Schür
Lilian Gasser
Fernando Perez-Cruz
Kristin Schirmer
Marco Baity-Jesi
机构
[1] Eawag,EPF Lausanne, School of Architecture
[2] Swiss Federal Institute of Aquatic Science and Technology,undefined
[3] Swiss Data Science Center (SDSC),undefined
[4] ETH Zürich: Department of Computer Science,undefined
[5] ETH Zürich: Department of Environmental Systems Science,undefined
[6] Civil and Environmental Engineering,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The use of machine learning for predicting ecotoxicological outcomes is promising, but underutilized. The curation of data with informative features requires both expertise in machine learning as well as a strong biological and ecotoxicological background, which we consider a barrier of entry for this kind of research. Additionally, model performances can only be compared across studies when the same dataset, cleaning, and splittings were used. Therefore, we provide ADORE, an extensive and well-described dataset on acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.
引用
收藏
相关论文
共 50 条
  • [1] A benchmark dataset for machine learning in ecotoxicology
    Schuer, Christoph
    Gasser, Lilian
    Perez-Cruz, Fernando
    Schirmer, Kristin
    Baity-Jesi, Marco
    [J]. SCIENTIFIC DATA, 2023, 10 (01)
  • [2] Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction
    Engdahl, Isak
    [J]. BIG DATA & SOCIETY, 2024, 11 (02)
  • [3] Machine Learning for Neurodegenerative Disorder - Diagnosis Survey of Practices and Launch of Benchmark Dataset
    Tagaris, Athanasios
    Kollias, Dimitrios
    Stafylopatis, Andreas
    Tagaris, Georgios
    Kollias, Stefanos
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (03)
  • [4] Benchmark Dataset for Training Machine Learning Models to Predict the Pathway Involvement of Metabolites
    Huckvale, Erik D.
    Powell, Christian D.
    Jin, Huan
    Moseley, Hunter N. B.
    [J]. METABOLITES, 2023, 13 (11)
  • [5] Machine learning for shipwreck segmentation from side scan sonar imagery: Dataset and benchmark
    Sethuraman, Advaith V.
    Sheppard, Anja
    Bagoren, Onur
    Pinnow, Christopher
    Anderson, Jamey
    Havens, Timothy C.
    Skinner, Katherine A.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024,
  • [6] AQ-Bench: a benchmark dataset for machine learning on global air quality metrics
    Betancourt, Clara
    Stomberg, Timo
    Roscher, Ribana
    Schultz, Martin G.
    Stadtler, Scarlet
    [J]. EARTH SYSTEM SCIENCE DATA, 2021, 13 (06) : 3013 - 3033
  • [7] ICG: A Machine Learning Benchmark Dataset and Baselines for Inline Code Comments Generation Task
    Zhang, Xiaowei
    Chen, Lin
    Zou, Weiqin
    Cao, Yulu
    Ren, Hao
    Wang, Zhi
    Li, Yanhui
    Zhou, Yuming
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (02) : 331 - 356
  • [8] TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation
    Ai, Yiming
    He, Zhiwei
    Yu, Kai
    Wang, Rui
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1930 - 1941
  • [9] GeoImageNet: a multi-source natural feature benchmark dataset for GeoAI and supervised machine learning
    Wenwen Li
    Sizhe Wang
    Samantha T. Arundel
    Chia-Yu Hsu
    [J]. GeoInformatica, 2023, 27 : 619 - 640
  • [10] Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset
    Stadtler, Scarlet
    Betancourt, Clara
    Roscher, Ribana
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (01): : 150 - 171