Statistics Evolution and Revolution to Meet Data Science Challenges

被引:0
|
作者
Wu, Hulin [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Sch Publ Hlth, Dept Biostat & Data Sci, 1200 Pressler St,Suite E 833, Houston, TX 77030 USA
关键词
Data curation; Pre-analysis tasks; Post-analysis tasks (PAT); Data preprocessing and preparation (DPP); Third modeling culture; ORDINARY DIFFERENTIAL-EQUATIONS; MODELS;
D O I
10.1007/s12561-024-09454-5
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The advent of the Big Data era has necessitated a transformational shift in statistical research, responding to the novel demands of data science. Despite extensive discourse within statistical communities on confronting these emerging challenges, we offer our unique perspectives, underscoring the extended responsibilities of statisticians in pre-analysis and post-analysis tasks. Moreover, we propose a new definition and classification of Big Data based on data sources: Type I Big Data, which is the result of aggregating a large number of small datasets via data sharing and curation, and Type II Big Data, which is the Real-World Data (RWD) amassed from business operations and practices. Each category necessitates distinct data preprocessing and preparation (DPP) methods, and the objectives of analysis as well as the interpretation of results can significantly diverge between these two types of Big Data. We further suggest that the statistical communities should consider adopting and rapidly incorporating new paradigms and cultures by learning from other disciplines. Particularly, beyond Breiman's (Stat Sci 16(3):199-231, 2021) two modeling cultures, statisticians may need to pay more attention to a newly emerging third culture: the integration of algorithmic modeling with multi-scale dynamic modeling based on fundamental physics laws or mechanisms that generate the data. We draw from our experience in numerous related research projects to elucidate these novel concepts and perspectives.
引用
收藏
页数:19
相关论文
共 50 条