Class imbalance: A crucial factor affecting the performance of tea plantations mapping by machine learning

被引:0
|
作者
Xiao, Yuanjun [1 ,2 ,3 ]
Huang, Jingfeng [1 ,2 ,3 ]
Weng, Wei [1 ,2 ,3 ]
Huang, Ran [4 ]
Shao, Qi [1 ,2 ,3 ]
Zhou, Chang [1 ,2 ,3 ]
Li, Shengcheng [1 ,2 ,3 ]
机构
[1] Zhejiang Univ, Key Lab Environm Remediat & Ecol Hlth, Minist Educ, Hangzhou 310058, Peoples R China
[2] Zhejiang Univ, Inst Appl Remote Sensing & Informat Technol, Hangzhou 310058, Peoples R China
[3] Key Lab Agr Remote Sensing & Informat Syst, Hangzhou 310058, Peoples R China
[4] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310058, Peoples R China
关键词
Class imbalance; Tea plantations; XGBoost; Machine learning; Sentinel-2; HJ-2; SMOTE;
D O I
10.1016/j.jag.2024.103849
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Due to disparities in area among various land cover types, class imbalance has always existed in crop mapping research, posing uncertainties in extracting minority classes which occupy a smaller area. In this paper, taking tea plantations mapping in Hangzhou city as an example, we created a series of training datasets with different imbalance -ratios (IRs), compared the accuracy between the extraction models using these datasets, and analyzed the impact of class imbalance on various machine learning algorithms (Artificial Neural Network, Decision Tree, Random Forest and XGBoost), aiming to provide a feasible approach to improve the mapping accuracy of minority classes. The leave -one -out cross validation results showed that in most cases, with the increase of the IR, the model's F 2 -score first increased and then decreased, and the increase of F 2 -scores ranged from 0.2% to 29.2%, suggesting that moderately increasing the number of other samples in the training dataset can improve the tea plantations extraction accuracy. Consistent result can also be obtained by using the whole city's samples for modeling and random sampling validation. XGBoost performed best among the four algorithms, which yielded the optimal tea plantations map with a PA of 97%, UA of 93% and F 2 -score of 96% when the IR of the training dataset was 6. The UA was 19% higher than that of the model using a balanced dataset (IR=1) and was 11% higher than that of the model using pseudo -balanced datasets created by the oversampling method. The conclusions of this study offer insights for the identification of minority classes, contributing to achieving higher accuracy in remote sensing crop mapping.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Class-specific kernelized extreme learning machine for binary class imbalance learning
    Raghuwanshi, Bhagat Singh
    Shukla, Sanyam
    [J]. APPLIED SOFT COMPUTING, 2018, 73 : 1026 - 1038
  • [2] Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning
    Bilal Mirza
    Zhiping Lin
    Kar-Ann Toh
    [J]. Neural Processing Letters, 2013, 38 : 465 - 486
  • [3] Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning
    Mirza, Bilal
    Lin, Zhiping
    Toh, Kar-Ann
    [J]. NEURAL PROCESSING LETTERS, 2013, 38 (03) : 465 - 486
  • [4] Probability Density Machine: A New Solution of Class Imbalance Learning
    Cheng, Ruihan
    Zhang, Longfei
    Wu, Shiqi
    Xu, Sen
    Gao, Shang
    Yu, Hualong
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [5] A Revisit to the Class Imbalance Learning with Linear Support Vector Machine
    Fan, Yang
    Kai, Zheng
    Qiang, Li
    [J]. 2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2014), 2014, : 516 - 521
  • [6] Class imbalance learning using UnderBagging based kernelized extreme learning machine
    Raghuwanshi, Bhagat Singh
    Shukla, Sanyam
    [J]. NEUROCOMPUTING, 2019, 329 : 172 - 187
  • [7] Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond
    Khan, Naimul Mefraz
    Abraham, Nabila
    Hon, Marcia
    Guan, Ling
    [J]. 2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 85 - 90
  • [8] CLASS ATTENDANCE AS A FACTOR AFFECTING ACADEMIC PERFORMANCE
    Donina, Agita
    Svetina, Karina
    Svetins, Kristaps
    [J]. SOCIETY. INTEGRATION. EDUCATION, VOL VI: PUBLIC HEALTH AND SPORT, RESEARCHES IN ECONOMICS AND MANAGEMENT FOR SUSTAINABLE EDUCATION, 2020, : 578 - 594
  • [9] Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models
    Benkendorf, Donald J.
    Schwartz, Samuel D.
    Cutler, D. Richard
    Hawkins, Charles P.
    [J]. ECOLOGICAL MODELLING, 2023, 483
  • [10] High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance
    Liu, Yang
    Li, Xiang
    Chen, Xianbang
    Wang, Xi
    Li, Huaqiang
    [J]. SCIENTIFIC PROGRAMMING, 2020, 2020 (2020)