Class imbalance: A crucial factor affecting the performance of tea plantations mapping by machine learning

被引：0

作者：

Xiao, Yuanjun ^{[1
,2
,3
]}

Huang, Jingfeng ^{[1
,2
,3
]}

Weng, Wei ^{[1
,2
,3
]}

Huang, Ran ^{[4
]}

Shao, Qi ^{[1
,2
,3
]}

Zhou, Chang ^{[1
,2
,3
]}

Li, Shengcheng ^{[1
,2
,3
]}

机构：

[1] Zhejiang Univ, Key Lab Environm Remediat & Ecol Hlth, Minist Educ, Hangzhou 310058, Peoples R China

[2] Zhejiang Univ, Inst Appl Remote Sensing & Informat Technol, Hangzhou 310058, Peoples R China

[3] Key Lab Agr Remote Sensing & Informat Syst, Hangzhou 310058, Peoples R China

[4] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310058, Peoples R China

来源：

INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION | 2024年 / 129卷

关键词：

Class imbalance; Tea plantations; XGBoost; Machine learning; Sentinel-2; HJ-2; SMOTE;

D O I：

10.1016/j.jag.2024.103849

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Due to disparities in area among various land cover types, class imbalance has always existed in crop mapping research, posing uncertainties in extracting minority classes which occupy a smaller area. In this paper, taking tea plantations mapping in Hangzhou city as an example, we created a series of training datasets with different imbalance -ratios (IRs), compared the accuracy between the extraction models using these datasets, and analyzed the impact of class imbalance on various machine learning algorithms (Artificial Neural Network, Decision Tree, Random Forest and XGBoost), aiming to provide a feasible approach to improve the mapping accuracy of minority classes. The leave -one -out cross validation results showed that in most cases, with the increase of the IR, the model's F 2 -score first increased and then decreased, and the increase of F 2 -scores ranged from 0.2% to 29.2%, suggesting that moderately increasing the number of other samples in the training dataset can improve the tea plantations extraction accuracy. Consistent result can also be obtained by using the whole city's samples for modeling and random sampling validation. XGBoost performed best among the four algorithms, which yielded the optimal tea plantations map with a PA of 97%, UA of 93% and F 2 -score of 96% when the IR of the training dataset was 6. The UA was 19% higher than that of the model using a balanced dataset (IR=1) and was 11% higher than that of the model using pseudo -balanced datasets created by the oversampling method. The conclusions of this study offer insights for the identification of minority classes, contributing to achieving higher accuracy in remote sensing crop mapping.

引用

页数：12

共 50 条

[31] A Comparative Analysis of Machine Learning Methods for Class Imbalance in a Smoking Cessation Intervention
Davagdorj, Khishigsuren
Lee, Jong Seol
Van Huy Pham
Ryu, Keun Ho
[J]. APPLIED SCIENCES-BASEL, 2020, 10 (09):
[32] A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning
Shi-Xiong Xia
Fan-Rong Meng
Bing Liu
Yong Zhou
[J]. Cognitive Computation, 2015, 7 : 74 - 85
[33] A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning
Xia, Shi-Xiong
Meng, Fan-Rong
Liu, Bing
Zhou, Yong
[J]. COGNITIVE COMPUTATION, 2015, 7 (01) : 74 - 85
[34] Extracting tea plantations in complex landscapes using Sentinel-2 imagery and machine learning algorithms
Panpan Chen
Chunjiang Zhao
Dandan Duan
Fan Wang
[J]. Community Ecology, 2022, 23 : 163 - 172
[35] Slack-Factor-Based Fuzzy Support Vector Machine for Class Imbalance Problems
Ren, Jinjun
Wang, Yuping
Deng, Xiyan
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (06)
[36] Extracting tea plantations in complex landscapes using Sentinel-2 imagery and machine learning algorithms
Chen, Panpan
Zhao, Chunjiang
Duan, Dandan
Wang, Fan
[J]. COMMUNITY ECOLOGY, 2022, 23 (02) : 163 - 172
[37] DISTRIBUTION OF PRACTICE AS A FACTOR AFFECTING LEARNING AND-OR PERFORMANCE
DUNHAM, P
[J]. JOURNAL OF MOTOR BEHAVIOR, 1976, 8 (04) : 305 - 307
[38] Class imbalance learning via a fuzzy total margin based support vector machine
Dai, Hong-Liang
[J]. APPLIED SOFT COMPUTING, 2015, 31 : 172 - 184
[39] A fuzzy twin support vector machine based on information entropy for class imbalance learning
Gupta, Deepak
Richhariya, Bharat
Borah, Parashjyoti
[J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (11): : 7153 - 7164
[40] A fuzzy twin support vector machine based on information entropy for class imbalance learning
Deepak Gupta
Bharat Richhariya
Parashjyoti Borah
[J]. Neural Computing and Applications, 2019, 31 : 7153 - 7164

← 1 2 3 4 5 →