Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice

被引:3
|
作者
Li, Dan [1 ]
机构
[1] CUNY, Baruch Coll, Philosophy Dept, New York, NY 10031 USA
关键词
Induction; Machine learning; Data ontology; No Free Lunch theorem; Goodman's riddle of induction; CLIMATE; MODELS;
D O I
10.1007/s11023-023-09639-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As scientists start to adopt machine learning (ML) as one research tool, the security of ML and the knowledge generated become a concern. In this paper, I explain how supervised ML can be improved with better data ontology, or the way we make categories and turn information into data. More specifically, we should design data ontology in such a way that is consistent with the knowledge that we have about the target phenomenon so that such ontology can help us make the inductive leap. I do so by thinking through a thought experiment, Goodman's New Riddle of Induction (Fact, fiction, and forecast, Harvard University Press, 1955). Goodman's riddle helps flesh out three problems of induction: (1) the problem of equal goodies, that there are often too many equally good inductive results given the same data; (2) the problem of diverging performance, that these equally good results can give opposite predictions in the future; and (3) the problem of mediocrity, that when averaged across all equally possible datasets and tasks, no inductive algorithm outperforms any other. I show that all these three problems are manifested as real obstacles in ML practice, namely, the Rashomon effect (Breiman in Stat Sci 16(3):199-231, 2001), the problem of underspecification (D'Amour et al. in J Mach Learn Res, 2020, https://doi.org/10.48550/arXiv.2011.03395), and the No Free Lunch theorem (Wolpert in Neural Comput 8(7):1341-90, 1996, https://doi.org/10.1162/neco.1996.8.7. 1341). Lastly, I argue that proper data ontology can help mitigate these problems and I demonstrate how using concrete examples from climate science. This research highlights the links between philosophers' discussions of induction and implications in ML practice.
引用
收藏
页码:429 / 450
页数:22
相关论文
共 50 条
  • [31] Machine learning of kinetic energy densities with target and feature smoothing: Better results with fewer training data
    Manzhos, Sergei
    Luder, Johann
    Ihara, Manabu
    JOURNAL OF CHEMICAL PHYSICS, 2023, 159 (23):
  • [32] Combining field potential data and in silico simulated data to improve machine learning approaches and better assess drug cardiotoxicity
    Raphel, Fabien
    de Korte, Tessa
    Lombardi, Damiano
    Braam, Stefan
    Bleunven, Christophe
    Bernasconi, Sylvain
    Gerbeau, Jean-Frederic
    JOURNAL OF PHARMACOLOGICAL AND TOXICOLOGICAL METHODS, 2020, 105
  • [33] How to be a better scientist: Lessons from scientific philosophy, the historical development of science, and past errors within exercise physiology
    Robergs, Robert A.
    Opeyemi, Olumide
    Torrens, Samuel
    SPORTS MEDICINE AND HEALTH SCIENCE, 2022, 4 (02) : 140 - 146
  • [34] Learning decision making: Some ideas on how novices better can learn from skilled response personnel
    Sommer, M.
    ADVANCES IN SAFETY, RELIABILITY AND RISK MANAGEMENT, 2012, : 156 - 164
  • [35] Two machines are better than one: Using multi-agent reinforcement in machine learning to simulate a clinical second opinion
    Xiao, Cao
    Gao, Junyi
    Glass, Lucas
    Sun, Jimeng
    Mack, Christina
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 90 - 90
  • [36] Sports and machine learning: How young people can use data from their own bodies to learn about machine learning
    Zimmermann-Niefield, Abigail
    Shapiro, R. Benjamin
    Kane, Shaun
    XRDS: Crossroads, 2019, 25 (04): : 44 - 49
  • [37] Could the influence of monitor farm programmes on practice change be BETTER? Lessons from sheep farmers and advisors in Ireland
    Mulkerrins, M. J.
    Gottstein, M.
    Gorman, M.
    Russell, T.
    Ryan, M.
    Lynch, M. B.
    JOURNAL OF AGRICULTURAL EDUCATION & EXTENSION, 2023, 29 (05): : 653 - 678
  • [38] Integrating research and system-wide practice in public health: lessons learnt from Better Start Bradford
    Josie Dickerson
    Philippa K. Bird
    Maria Bryant
    Nimarta Dharni
    Sally Bridges
    Kathryn Willan
    Sara Ahern
    Abigail Dunn
    Dea Nielsen
    Eleonora P. Uphoff
    Tracey Bywater
    Claudine Bowyer-Crane
    Pinki Sahota
    Neil Small
    Michaela Howell
    Gill Thornton
    Kate E. Pickett
    Rosemary R. C. McEachan
    John Wright
    BMC Public Health, 19
  • [39] Integrating research and system-wide practice in public health: lessons learnt from Better Start Bradford
    Dickerson, Josie
    Bird, Philippa K.
    Bryant, Maria
    Dharni, Nimarta
    Bridges, Sally
    Willan, Kathryn
    Ahern, Sara
    Dunn, Abigail
    Nielsen, Dea
    Uphoff, Eleonora P.
    Bywater, Tracey
    Bowyer-Crane, Claudine
    Sahota, Pinki
    Small, Neil
    Howell, Michaela
    Thornton, Gill
    Pickett, Kate E.
    McEachan, Rosemary R. C.
    Wright, John
    BMC PUBLIC HEALTH, 2019, 19 (1)
  • [40] Citizen science for better management: Lessons learned from three Norwegian beach litter data sets
    Falk-Andersson, Jannike
    Berkhout, Boris Woody
    Abate, Tenaw Gedefaw
    MARINE POLLUTION BULLETIN, 2019, 138 : 364 - 375