Data leakage jeopardizes ecological applications of machine learning

被引:9
|
作者
Stock, Andy [1 ]
Gregr, Edward J. [1 ,2 ]
Chan, Kai M. A. [1 ]
机构
[1] Univ British Columbia, Inst Resources Environm & Sustainabil, Vancouver, BC, Canada
[2] SciTech Environm Consulting, Vancouver, BC, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
VALIDATION;
D O I
10.1038/s41559-023-02162-1
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Machine learning is a popular tool in ecology but many scientific applications suffer from data leakage, causing misleading results. We highlight common pitfalls in ecological machine-learning methods and argue that discipline-specific model info sheets must be developed to aid in model evaluations.
引用
收藏
页码:1743 / 1745
页数:3
相关论文
共 50 条
  • [21] Machine Learning and Data Mining Applications in Power Systems
    Leonowicz, Zbigniew
    Jasinski, Michal
    ENERGIES, 2022, 15 (05)
  • [22] Current applications of big data and machine learning in cardiology
    Renato Cuocolo
    Teresa Perillo
    Eliana De Rosa
    Lorenzo Ugga
    Mario Petretta
    Journal of Geriatric Cardiology, 2019, 16 (08) : 601 - 607
  • [23] Current applications of big data and machine learning in cardiology
    Cuocolo, Renato
    Perillo, Teresa
    De Rosa, Eliana
    Ugga, Lorenzo
    Petretta, Mario
    JOURNAL OF GERIATRIC CARDIOLOGY, 2019, 16 (08) : 601 - 607
  • [24] AUTOMATED MACHINE LEARNING & SYNTHETIC DATA APPLICATIONS IN MEDICINE
    Rashidi, Hooman
    INTERNATIONAL JOURNAL OF LABORATORY HEMATOLOGY, 2023, 45 : 93 - 93
  • [25] A Syllabus on Data Mining and Machine Learning with Applications to Cybersecurity
    Epishkina, Anna
    Zapechnikov, Sergey
    2016 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION PROCESSING, DATA MINING, AND WIRELESS COMMUNICATIONS (DIPDMWC), 2016, : 194 - 199
  • [26] Machine learning, data mining, and computational statistics applications
    Wegman, Edward J.
    Said, Yasmin H.
    Scott, David W.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2011, 3 (03) : 187 - 187
  • [27] Machine learning applications for therapeutic tasks with genomics data
    Huang, Kexin
    Xiao, Cao
    Glass, Lucas M.
    Critchlow, Cathy W.
    Gibson, Greg
    Sun, Jimeng
    PATTERNS, 2021, 2 (10):
  • [28] On Development of Data Science and Machine Learning Applications in Databricks
    Ruan, Wenhao
    Chen, Yifan
    Forouraghi, Babak
    SERVICES - SERVICES 2019, 2019, 11517 : 78 - 91
  • [29] XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
    Kartashov, Oleg O.
    Chernov, Andrey V.
    Polyanichenko, Dmitry S.
    Butakova, Maria A.
    MATERIALS, 2021, 14 (24)
  • [30] Data Mining and Machine Learning Applications for Educational Big Data in the University
    Abe, Keisuke
    IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 350 - 355