Small values in big data: The continuing need for appropriate metadata

被引:14
|
作者
Stow, Craig A. [1 ]
Webster, Katherine E. [2 ]
Wagner, Tyler [3 ]
Lottig, Noah [4 ]
Soranno, Patricia A. [2 ]
Cha, YoonKyung [5 ]
机构
[1] NOAA, Great Lakes Environm Res Lab, Ann Arbor, MI 48176 USA
[2] Michigan State Univ, Dept Fisheries & Wildlife, E Lansing, MI 48824 USA
[3] Penn State Univ, US Geol Survey, Penn Cooperat Fish & Wildlife Unit, 402 Forest Resources Bldg, University Pk, PA 16802 USA
[4] Univ Wisconsin, Ctr Limnol, Boulder Jct, CO USA
[5] Univ Seoul, Sch Environm Engn, Seoul, South Korea
基金
美国食品与农业研究所; 美国国家科学基金会;
关键词
WATER-QUALITY DATA; DISTRIBUTIONAL PARAMETERS; STATISTICAL TREATMENTS; CENSORED-DATA; NONDETECTS; SYSTEMS; SCIENCE; MODEL; US;
D O I
10.1016/j.ecoinf.2018.03.002
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many ecological datasets contain left-censored data observations below an analytical detection limit. Studies from single and typically small datasets show that common approaches for handling censored data - e.g., deletion or substituting fixed values - result in systematic biases. However, no studies have explored the degree to which the documentation and presence of censored data influence outcomes from large, multi-sourced datasets. We describe left-censored data in a lake water quality database assembled from 74 sources and illustrate the challenges of dealing with small values in big data, including detection limits that are absent, range widely, and show trends over time. We show that substitutions of censored data can also bias analyses using 'big data' datasets, that censored data can be effectively handled with modem quantitative approaches, but that such approaches rely on accurate metadata that describe treatment of censored data from each source.
引用
收藏
页码:26 / 30
页数:5
相关论文
共 50 条
  • [1] Big Metadata: When Metadata is Big Data
    Edara, Pavan
    Pasumansky, Mosha
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 3083 - 3095
  • [2] Big data and big values: When companies need to rethink themselves
    Barchiesi, Maria Assunta
    Colladon, Andrea Fronzetti
    [J]. JOURNAL OF BUSINESS RESEARCH, 2021, 129 : 714 - 722
  • [3] Data, Big Data, and Metadata in Anesthesiology
    Levin, Matthew A.
    Wanderer, Jonathan P.
    Ehrenfeld, Jesse M.
    [J]. ANESTHESIA AND ANALGESIA, 2015, 121 (06): : 1661 - 1667
  • [4] Metadata management in a big data infrastructure
    Holom, Roxana-Maria
    Rafetseder, Katharina
    Kritzinger, Stefanie
    Sehrschoen, Herald
    [J]. INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING (ISM 2019), 2020, 42 : 375 - 382
  • [5] Metadata handling for Big Data projects
    Golosova, M.
    Aulov, V
    Kaida, A.
    [J]. BIGDATA CONFERENCE (FORMERLY INTERNATIONAL CONFERENCE ON BIG DATA AND ITS APPLICATIONS), 2018, 1117
  • [8] Big Metadata,Smart Metadata,and Metadata Capital:Toward Greater Synergy Between Data Science and Metadata
    Jane Greenberg
    [J]. JournalofDataandInformationScience, 2017, 2 (03) - 36
  • [9] Small hospital, big values
    Anzeveno, J
    [J]. HOSPITALS & HEALTH NETWORKS, 2001, 75 (07): : 12 - +
  • [10] Small Departures, Big Continuities?: Norms, values, and routines in The Guardian's big data journalism
    Tandoc, Edson C., Jr.
    Oh, Soo-Kwang
    [J]. JOURNALISM STUDIES, 2017, 18 (08) : 997 - 1015