Comparative analysis of the performance of selected machine learning algorithms depending on the size of the training sample

被引:0
|
作者
Kupidura, Przemyslaw [1 ]
Kepa, Agnieszka [1 ]
Krawczyk, Piotr [2 ]
机构
[1] Warsaw Univ Technol, Fac Geodesy & Cartog, Pl Politechniki 1, PL-00661 Warsaw, Poland
[2] Orbitile Ltd, Potulkaly 6B-4, Warsaw, Poland
关键词
efficiency; classification; machine learning; remote sensing; satellite imagery; training sample size; SATELLITE IMAGERY; LAND-COVER; CLASSIFICATION; VARIABLES; LIDAR;
D O I
10.2478/rgg-2024-0015
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
The article presents an analysis of the effectiveness of selected machine learning methods: Random Forest (RF), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM) in the classification of land use and cover in satellite images. Several variants of each algorithm were tested, adopting different parameters typical for each of them. Each variant was classified multiple (20) times, using training samples of different sizes: from 100 pixels to 200,000 pixels. The tests were conducted independently on 3 Sentinel-2 satellite images, identifying 5 basic land cover classes: built-up areas, soil, forest, water, and low vegetation. Typical metrics were used for the accuracy assessment: Cohen's kappa coefficient, overall accuracy (for whole images), as well as F-1 score, precision, and recall (for individual classes). The results obtained for different images were consistent and clearly indicated an increase in classification accuracy with the increase in the size of the training sample. They also showed that among the tested algorithms, the XGB algorithm is the most sensitive to the size of the training sample, while the least sensitive is SVM, which achieved relatively good results even when using training samples of the smallest sizes. At the same time, it was pointed out that while in the case of RF and XGB algorithms the differences between the tested variants were slight, the effectiveness of SVM was very much dependent on the gamma parameter - with too high values of this parameter, the model showed a tendency to overfit, which did not allow for satisfactory results.
引用
收藏
页码:53 / 69
页数:17
相关论文
共 50 条
  • [31] Comparative analysis of machine learning algorithms in detection of phishing websites
    Kosan, Muhammed Ali
    Yildiz, Oktay
    Karacan, Hacer
    [J]. PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2018, 24 (02): : 276 - 282
  • [32] Machine Learning Algorithms for Transportation Mode Prediction: A Comparative Analysis
    Murrar S.
    Alhaj F.
    Qutqut M.H.
    [J]. Informatica (Slovenia), 2024, 48 (06): : 117 - 130
  • [33] A comparative analysis of machine learning algorithms for predicting wave runup
    Durap, Ahmet
    [J]. ANTHROPOCENE COASTS, 2023, 6 (01)
  • [34] Comparative Analysis of Machine Learning Algorithms in Breast Cancer Classification
    Satish Chaurasiya
    Ranjit Rajak
    [J]. Wireless Personal Communications, 2023, 131 : 763 - 772
  • [35] Comparative Analysis of Machine Learning Algorithms for Audio Signals Classification
    Mahana, Poonam
    Singh, Gurbhej
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (06): : 49 - 55
  • [36] Comparative analysis of machine learning algorithms to detect fake news
    Indarapu, Sai Rama Krishna
    Komalla, Jahnavi
    Inugala, Dheeraj Reddy
    Kota, Gowtham Reddy
    Sanam, Anjali
    [J]. ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 591 - 594
  • [37] Comparative Study of Machine Learning Algorithms for Movie Sentiment Analysis
    Arfaoui, Nouha
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2023, 18 (01): : 25 - 38
  • [38] Comparative Analysis of Machine Learning Algorithms for CKD Risk Prediction
    Yang, Weilin
    Ahmed, Nasim
    Barczak, Andre L. C.
    [J]. IEEE Access, 2024, 12 : 171205 - 171220
  • [39] A COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR IPO UNDERPERFORMANCE PREDICTION
    Sonsare, Pravinkumar M.
    Pande, Ashtavinayak
    Kumar, Sudhanshu
    Kurve, Akshay
    Shanbhag, Chinmay
    [J]. JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2023, 5 (06):
  • [40] Comparative Study of Machine Learning Algorithms for Twitter Sentiment Analysis
    Indulkar, Yash
    Patil, Abhijit
    [J]. 2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 295 - 299