The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods

被引:3
|
作者
Shin, Yuna [1 ]
Lee, Heesuk [2 ]
Lee, Young-Joo [2 ]
Seo, Dae Keun [2 ]
Jeong, Bomi [3 ]
Hong, Seoksu [3 ]
Kim, Jaehoon [3 ]
Kim, Taekgeun [3 ]
Lee, Jae-Kyeong [4 ]
Heo, Tae-Young [3 ]
机构
[1] Natl Inst Environm Res, Dept Water Environm Res, Incheon 22689, South Korea
[2] K Water, Daejeon 34045, South Korea
[3] Chungbuk Natl Univ, Dept Informat & Stat, Chungbuk 28644, South Korea
[4] Korea Inst Sci Technol Informat, Idea Commercializat Ctr, Seoul 02456, South Korea
关键词
ARTIFICIAL NEURAL-NETWORK; WATER; VARIABLES; MODELS; BLOOMS; LAKE;
D O I
10.1155/2019/5749746
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This study adopts two approaches to analyze the occurrence of algae at Haman Weir for Nakdong River; one is the traditional statistical method, such as logistic regression, while the other is machine learning technique, such as kNN, ANN, RF, Bagging, Boosting, and SVM. In order to compare the performance of the models, this study measured the accuracy, specificity, sensitivity, and AUC, which are representative model evaluation tools. The ROC curve is created by plotting association of sensitivity and (1-specificity). The AUC that is area of ROC curve represents sensitivity and specificity. This measure has two competitive advantages compared to other evaluation tools. One is that it is scale-invariant. It means that purpose of AUC is how well the model predicts. The other is that the AUC is classification-threshold-invariant. It shows that the AUC is independent of threshold because it is plotted association of sensitivity and (1-specificity) obtained by threshold. We chose AUC as a final model evaluation tool with two advantages. Also, variable selection was conducted using the Boruta algorithm. In addition, we tried to distinguish the better model by comparing the model with the variable selection method and the model without the variable selection method. As a result of the analysis, Boruta algorithm as a variable selection method suggested PO4-P, DO, BOD, NH3-N, Susp, pH, TOC, Temp, TN, and TP as significant explanatory variables. A comparison was made between the model with and without these selected variables. Among the models without variable selection method, the accuracy of RF analysis was highest, and ANN analysis showed the highest AUC. In conclusion, ANN analysis using the variable selection method showed the best performance among the models with and without variable selection method.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Protein Abundance Prediction Through Machine Learning Methods
    Ferreira, Mauricio
    Ventorim, Rafaela
    Almeida, Eduardo
    Silveira, Sabrina
    Silveira, Wendel
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2021, 433 (22)
  • [2] A Comparison of Machine Learning Methods for the Prediction of Breast Cancer
    Silva, Sara
    Anunciacao, Orlando
    Lotz, Marco
    [J]. EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS, 2011, 6623 : 159 - +
  • [3] Comparison of machine learning methods for multiphase flowrate prediction
    Jiang, Zhenyu
    Wang, Haokun
    Yang, Yunjie
    Li, Yi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS & TECHNIQUES (IST 2019), 2019,
  • [4] A comparison of machine learning methods for ozone pollution prediction
    Pan, Qilong
    Harrou, Fouzi
    Sun, Ying
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [5] A comparison of machine learning methods for ozone pollution prediction
    Qilong Pan
    Fouzi Harrou
    Ying Sun
    [J]. Journal of Big Data, 10
  • [6] A Comparison of Two Modern Machine Learning Methods for Mortality Prediction
    Odden, M.
    Peralta, C.
    Snowden, J.
    [J]. JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 2016, 64 : S99 - S99
  • [7] Monthly streamflow prediction and performance comparison of machine learning and deep learning methods
    Ayana, Omer
    Kanbak, Deniz Furkan
    Keles, Muemine Kaya
    Turhan, Evren
    [J]. ACTA GEOPHYSICA, 2023, 71 (06) : 2905 - 2922
  • [8] Monthly streamflow prediction and performance comparison of machine learning and deep learning methods
    Ömer Ayana
    Deniz Furkan Kanbak
    Mümine Kaya Keleş
    Evren Turhan
    [J]. Acta Geophysica, 2023, 71 : 2905 - 2922
  • [9] Diabetes Induced Factors Prediction Based on Various Improved Machine Learning Methods
    Wu, Jun
    Qu, Lulu
    Yang, Guoping
    Han, Nan
    [J]. CURRENT BIOINFORMATICS, 2022, 17 (03) : 254 - 262
  • [10] A comparative study of various machine learning methods for performance prediction of an evaporative condenser
    Behnam, Pooria
    Faegh, Meysam
    Shafii, Mohammad Behshad
    Khiadani, Mehdi
    [J]. INTERNATIONAL JOURNAL OF REFRIGERATION, 2021, 126 : 280 - 290