Data mining;
Big data;
Holdout data;
KNOWLEDGE DISCOVERY;
REPLICABILITY;
SCIENCE;
D O I:
10.1007/s42452-020-2862-5
中图分类号:
O [数理科学和化学];
P [天文学、地球科学];
Q [生物科学];
N [自然科学总论];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Background The data deluge seemingly makes it more likely that data mining will discover new, heretofore unknown relationships. Findings Monte Carlo simulations demonstrate the paradox of big data: the data deluge makes it more likely that the patterns and relationships discovered by data mining are spurious. Conclusion Models are more likely to be reliable if expert opinion is used in their specification, instead of viewing human expertise as an unhelpful constraint on knowledge discovery.