Bugs in the Data: How ImageNet Misrepresents Biodiversity

被引:0
|
作者
Luccioni, Alexandra Sasha [1 ]
Rolnick, David [2 ]
机构
[1] Hugging Face, Paris, France
[2] McGill Univ, Mila, Montreal, PQ, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ImageNet-1k is a dataset often used for benchmarking machine learning (ML) models and evaluating tasks such as image recognition and object detection. Wild animals make up 27% of ImageNet-1k but, unlike classes representing people and objects, these data have not been closely scrutinized. In the current paper, we analyze the 13,450 images from 269 classes that represent wild animals in the ImageNet-1k validation set, with the participation of expert ecologists. We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect. We also find that both the wildlife-related labels and images included in ImageNet-1k present significant geographical and cultural biases, as well as ambiguities such as artificial animals, multiple species in the same image, or the presence of humans. Our findings highlight serious issues with the extensive use of this dataset for evaluating ML systems, the use of such algorithms in wildlife-related tasks, and more broadly the ways in which ML datasets are commonly created and curated.
引用
收藏
页码:14382 / 14390
页数:9
相关论文
共 50 条
  • [1] When bugs reveal biodiversity
    Bohmann, Kristine
    Schnell, Ida Baerholm
    Gilbert, M. Thomas P.
    MOLECULAR ECOLOGY, 2013, 22 (04) : 909 - 911
  • [2] Unauthorized minds: How "theory of mind" theory misrepresents autism
    Smukler, D
    MENTAL RETARDATION, 2005, 43 (01): : 11 - 24
  • [3] How do visitors relate to biodiversity conservation? An analysis of London Zoo's "BUGS' exhibit
    Chalmin-Pui, Lauriane Suyin
    Perkins, Richard
    ENVIRONMENTAL EDUCATION RESEARCH, 2017, 23 (10) : 1462 - 1475
  • [4] How Well Do Sparse ImageNet Models Transfer?
    Iofinova, Eugenia
    Peste, Alexandra
    Kurtz, Mark
    Alistarh, Dan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12256 - 12266
  • [6] Archaeological fantasies: How pseudoarchaeology misrepresents the past and misleads the public.
    Stoczkowski, Wiktor
    ANTIQUITY, 2007, 81 (312) : 472 - 473
  • [7] Bugs and biodiversity in Scotland's plantation forests
    Anon
    New Zealand Journal of Forestry, 2002, 46 (04):
  • [8] How bugs are born: a model to identify how bugs are introduced in software components
    Rodriguez-Perez, Gema
    Robles, Gregorio
    Serebrenik, Alexander
    Zaidman, Andy
    German, Daniel M.
    Gonzalez-Barahona, Jesus M.
    EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (02) : 1294 - 1340
  • [9] How bugs are born: a model to identify how bugs are introduced in software components
    Gema Rodríguez-Pérez
    Gregorio Robles
    Alexander Serebrenik
    Andy Zaidman
    Daniel M. Germán
    Jesus M. Gonzalez-Barahona
    Empirical Software Engineering, 2020, 25 : 1294 - 1340
  • [10] "Bugs on Bugs": An Inquiry-Based, Collaborative Activity to Learn Arthropod & Microbial Biodiversity
    Lampert, Evan C.
    Morgan, Jeanele M.
    AMERICAN BIOLOGY TEACHER, 2015, 77 (05): : 323 - 331