Comparison of imputation methods for missing production data of dairy cattle

被引:3
|
作者
You, J. [1 ]
Ellis, J. L. [1 ]
Adams, S. [1 ]
Sahar, M. [1 ]
Jacobs, M. [2 ,3 ]
Tulpan, D. [1 ]
机构
[1] Univ Guelph, Dept Anim Biosci, Guelph, ON, Canada
[2] Trouw Nutr Innovat Dept, Amersfoort, Netherlands
[3] FR Analyt, Wierden, Overijssel, Netherlands
关键词
Big data; Dairy cow; Interpolation; Machine learning; Unavailable values; MODELS; CURVE;
D O I
10.1016/j.animal.2023.100921
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
Nowadays, vast amounts of data representing feed intake, growth, and environmental impact of individual animals are being recorded in on-farm settings. Despite their apparent use, data collected in real-world applications often have missing values in one or several variables, due to reasons including human error, machine error, or sampling frequency misalignment across multiple variables. Since incomplete datasets are less valuable for downstream data analysis, it is important to address the missing value problem properly. One option may be to reduce the dataset to a subset that contains only complete data, but considerable data may be lost via this process. The current study aimed to compare imputation methods for the estimation of missing values in a raw dataset of dairy cattle including 454 553 records collected from 629 cows between 2009 and 2020. The dataset was subjected to a cleaning process that reduced its size to 437 075 observations corresponding to 512 cows. Missing values were present in four variables: concentrate DM intake (CDMI, missing percentage = 2.30%), forage DM intake (FDMI, 8.05%), milk yield (MY, 15.12%), and BW (64.33%). After removing all missing values, the resulting dataset (n = 129 353) was randomly sampled five times to create five independent subsets that exhibit the same missing data percentages as the cleaned dataset. Four univariate and nine multivariate imputation methods (eight machine learning methods and the MissForest method) were applied and evaluated on the five repeats, and average imputation performance was reported for each repeat. The results showed that Random Forest was overall the best imputation method for this type of data and had a lower mean squared prediction error and higher concordance correlation coefficient than the other imputation methods for all imputed variables. Random Forest performed particularly well for imputing CDMI, MY, and BW, compared to imputing FDMI. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of The Animal Consortium. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [2] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [3] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    [J]. PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [4] A comparison of imputation methods for the consecutive missing temperature data
    Kim, Hee-Kyung
    Kang, In-Kyeong
    Lee, Jae-Won
    Lee, Yung-Seop
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (03) : 549 - 557
  • [5] The use of genomic data and imputation methods in dairy cattle breeding
    Klimova, Anita
    Kasna, Eva
    Machova, Karolina
    Brzakova, Michaela
    Pribyl, Josef
    Vostry, Lubos
    [J]. CZECH JOURNAL OF ANIMAL SCIENCE, 2020, 65 (12) : 445 - 453
  • [6] Application and Comparison of Imputation Methods for Missing Degradation Data
    Fan, Ye
    Sun, Fuqiang
    Jiang, Tongmin
    [J]. ENGINEERING ASSET MANAGEMENT - SYSTEMS, PROFESSIONAL PRACTICES AND CERTIFICATION, 2015, : 1607 - 1614
  • [7] Comparison of imputation methods for missing laboratory data in medicine
    Waljee, Akbar K.
    Mukherjee, Ashin
    Singal, Amit G.
    Zhang, Yiwei
    Warren, Jeffrey
    Balis, Ulysses
    Marrero, Jorge
    Zhu, Ji
    Higgins, Peter D. R.
    [J]. BMJ OPEN, 2013, 3 (08):
  • [8] Missing Network Data A Comparison of Different Imputation Methods
    Krause, Robert W.
    Huisman, Mark
    Steglich, Christian
    Snijders, Tom A. B.
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 159 - 163
  • [9] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    [J]. ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [10] A comparison of multiple imputation methods for missing data in longitudinal studies
    Huque, Md Hamidul
    Carlin, John B.
    Simpson, Julie A.
    Lee, Katherine J.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18