Comparative analysis of the performance of selected machine learning algorithms depending on the size of the training sample

被引:0
|
作者
Kupidura, Przemyslaw [1 ]
Kepa, Agnieszka [1 ]
Krawczyk, Piotr [2 ]
机构
[1] Warsaw Univ Technol, Fac Geodesy & Cartog, Pl Politechniki 1, PL-00661 Warsaw, Poland
[2] Orbitile Ltd, Potulkaly 6B-4, Warsaw, Poland
关键词
efficiency; classification; machine learning; remote sensing; satellite imagery; training sample size; SATELLITE IMAGERY; LAND-COVER; CLASSIFICATION; VARIABLES; LIDAR;
D O I
10.2478/rgg-2024-0015
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
The article presents an analysis of the effectiveness of selected machine learning methods: Random Forest (RF), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM) in the classification of land use and cover in satellite images. Several variants of each algorithm were tested, adopting different parameters typical for each of them. Each variant was classified multiple (20) times, using training samples of different sizes: from 100 pixels to 200,000 pixels. The tests were conducted independently on 3 Sentinel-2 satellite images, identifying 5 basic land cover classes: built-up areas, soil, forest, water, and low vegetation. Typical metrics were used for the accuracy assessment: Cohen's kappa coefficient, overall accuracy (for whole images), as well as F-1 score, precision, and recall (for individual classes). The results obtained for different images were consistent and clearly indicated an increase in classification accuracy with the increase in the size of the training sample. They also showed that among the tested algorithms, the XGB algorithm is the most sensitive to the size of the training sample, while the least sensitive is SVM, which achieved relatively good results even when using training samples of the smallest sizes. At the same time, it was pointed out that while in the case of RF and XGB algorithms the differences between the tested variants were slight, the effectiveness of SVM was very much dependent on the gamma parameter - with too high values of this parameter, the model showed a tendency to overfit, which did not allow for satisfactory results.
引用
收藏
页码:53 / 69
页数:17
相关论文
共 50 条
  • [1] Performance Evaluation of Machine Learning and Deep Learning Algorithms in Crop Classification: Impact of Hyper-parameters and Training Sample Size
    Kim, Yeseul
    Kwak, Geun-Ho
    Lee, Kyung-Do
    Na, Sang-Il
    Park, Chan-Won
    Park, No-Wook
    [J]. KOREAN JOURNAL OF REMOTE SENSING, 2018, 34 (05) : 811 - 827
  • [2] Comparative Analysis of Machine learning algorithms in OCR
    Jain, Vanita
    Dubey, Arun
    Gupta, Amit
    Sharma, Sanchit
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1089 - 1092
  • [3] Enhancing Student Academic Performance Forecasting: A Comparative Analysis of Machine Learning Algorithms
    Ishaan Dawar
    Sakshi Negi
    Sumita Lamba
    Ashok Kumar
    [J]. SN Computer Science, 5 (6)
  • [4] Comparative Analysis of Supervised Machine Learning Algorithms for Evaluating the Performance Level of Students
    Subha, S.
    Priya, S. Baghavathi
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 348 - 357
  • [5] Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms
    Surabhi Saxena
    Debashish Mohapatra
    Subhransu Padhee
    Goutam Kumar Sahoo
    [J]. Evolutionary Intelligence, 2023, 16 : 587 - 603
  • [6] Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms
    Saxena, Surabhi
    Mohapatra, Debashish
    Padhee, Subhransu
    Sahoo, Goutam Kumar
    [J]. EVOLUTIONARY INTELLIGENCE, 2023, 16 (02) : 587 - 603
  • [7] A comparative study of selected machine learning algorithms for electrical impedance tomography
    Dziadosz, Marcin
    Mazurek, Mariusz
    Stefaniak, Barbara
    Wojcik, Dariusz
    Gauda, Konrad
    [J]. PRZEGLAD ELEKTROTECHNICZNY, 2024, 100 (04): : 237 - 240
  • [8] A Comparative Analysis of Selected Predictive Algorithms in Control of Machine Processes
    Dymora, Pawel
    Mazurek, Miroslaw
    Bomba, Slawomir
    [J]. ENERGIES, 2022, 15 (05)
  • [9] Experimental Performance Analysis of Machine Learning Algorithms
    Khekare, Ganesh
    Turukmane, Anil V.
    Dhule, Chetan
    Sharma, Pooja
    Kumar Bramhane, Lokesh
    [J]. Lecture Notes in Electrical Engineering, 2022, 942 LNEE : 1041 - 1052
  • [10] Machine learning algorithms in microbial classification: a comparative analysis
    Wu, Yuandi
    Gadsden, S. Andrew
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6