Class imbalance: A crucial factor affecting the performance of tea plantations mapping by machine learning

被引:0
|
作者
Xiao, Yuanjun [1 ,2 ,3 ]
Huang, Jingfeng [1 ,2 ,3 ]
Weng, Wei [1 ,2 ,3 ]
Huang, Ran [4 ]
Shao, Qi [1 ,2 ,3 ]
Zhou, Chang [1 ,2 ,3 ]
Li, Shengcheng [1 ,2 ,3 ]
机构
[1] Zhejiang Univ, Key Lab Environm Remediat & Ecol Hlth, Minist Educ, Hangzhou 310058, Peoples R China
[2] Zhejiang Univ, Inst Appl Remote Sensing & Informat Technol, Hangzhou 310058, Peoples R China
[3] Key Lab Agr Remote Sensing & Informat Syst, Hangzhou 310058, Peoples R China
[4] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310058, Peoples R China
关键词
Class imbalance; Tea plantations; XGBoost; Machine learning; Sentinel-2; HJ-2; SMOTE;
D O I
10.1016/j.jag.2024.103849
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Due to disparities in area among various land cover types, class imbalance has always existed in crop mapping research, posing uncertainties in extracting minority classes which occupy a smaller area. In this paper, taking tea plantations mapping in Hangzhou city as an example, we created a series of training datasets with different imbalance -ratios (IRs), compared the accuracy between the extraction models using these datasets, and analyzed the impact of class imbalance on various machine learning algorithms (Artificial Neural Network, Decision Tree, Random Forest and XGBoost), aiming to provide a feasible approach to improve the mapping accuracy of minority classes. The leave -one -out cross validation results showed that in most cases, with the increase of the IR, the model's F 2 -score first increased and then decreased, and the increase of F 2 -scores ranged from 0.2% to 29.2%, suggesting that moderately increasing the number of other samples in the training dataset can improve the tea plantations extraction accuracy. Consistent result can also be obtained by using the whole city's samples for modeling and random sampling validation. XGBoost performed best among the four algorithms, which yielded the optimal tea plantations map with a PA of 97%, UA of 93% and F 2 -score of 96% when the IR of the training dataset was 6. The UA was 19% higher than that of the model using a balanced dataset (IR=1) and was 11% higher than that of the model using pseudo -balanced datasets created by the oversampling method. The conclusions of this study offer insights for the identification of minority classes, contributing to achieving higher accuracy in remote sensing crop mapping.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] A Comparative Analysis of Machine Learning Methods for Class Imbalance in a Smoking Cessation Intervention
    Davagdorj, Khishigsuren
    Lee, Jong Seol
    Van Huy Pham
    Ryu, Keun Ho
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (09):
  • [32] A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning
    Shi-Xiong Xia
    Fan-Rong Meng
    Bing Liu
    Yong Zhou
    [J]. Cognitive Computation, 2015, 7 : 74 - 85
  • [33] A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning
    Xia, Shi-Xiong
    Meng, Fan-Rong
    Liu, Bing
    Zhou, Yong
    [J]. COGNITIVE COMPUTATION, 2015, 7 (01) : 74 - 85
  • [34] Extracting tea plantations in complex landscapes using Sentinel-2 imagery and machine learning algorithms
    Panpan Chen
    Chunjiang Zhao
    Dandan Duan
    Fan Wang
    [J]. Community Ecology, 2022, 23 : 163 - 172
  • [35] Slack-Factor-Based Fuzzy Support Vector Machine for Class Imbalance Problems
    Ren, Jinjun
    Wang, Yuping
    Deng, Xiyan
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (06)
  • [36] Extracting tea plantations in complex landscapes using Sentinel-2 imagery and machine learning algorithms
    Chen, Panpan
    Zhao, Chunjiang
    Duan, Dandan
    Wang, Fan
    [J]. COMMUNITY ECOLOGY, 2022, 23 (02) : 163 - 172
  • [37] DISTRIBUTION OF PRACTICE AS A FACTOR AFFECTING LEARNING AND-OR PERFORMANCE
    DUNHAM, P
    [J]. JOURNAL OF MOTOR BEHAVIOR, 1976, 8 (04) : 305 - 307
  • [38] Class imbalance learning via a fuzzy total margin based support vector machine
    Dai, Hong-Liang
    [J]. APPLIED SOFT COMPUTING, 2015, 31 : 172 - 184
  • [39] A fuzzy twin support vector machine based on information entropy for class imbalance learning
    Gupta, Deepak
    Richhariya, Bharat
    Borah, Parashjyoti
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (11): : 7153 - 7164
  • [40] A fuzzy twin support vector machine based on information entropy for class imbalance learning
    Deepak Gupta
    Bharat Richhariya
    Parashjyoti Borah
    [J]. Neural Computing and Applications, 2019, 31 : 7153 - 7164