An Optimized Approach for Predicting Water Quality Features Based on Machine Learning

被引:14
|
作者
Suwadi, Nur Afyfah [1 ]
Derbali, Morched [2 ]
Sani, Nor Samsiah [3 ]
Lam, Meng Chun [1 ]
Arshad, Haslina [4 ]
Khan, Imran [5 ]
Kim, Ki-Il [6 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Mixed Real & Pervas Comp Lab, Bangi 43600, Malaysia
[2] King Abdulaziz Univ KAU, Fac Comp & Informat Technol FCIT, Jeddah, Saudi Arabia
[3] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Bangi 43600, Malaysia
[4] Univ Kebangsaan Malaysia, Inst IR4 IIR4 0 0, Bangi 43600, Malaysia
[5] Univ Engn & Technol Peshawar, Dept Elect Engn, Peshawar, Pakistan
[6] Chungnam Natl Univ, Dept Comp Sci & Engn, Daejeon 34134, South Korea
关键词
FEATURE-SELECTION; RIVER; ALGORITHM; POLLUTION;
D O I
10.1155/2022/3397972
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditionally, water quality is assessed using costly laboratory and statistical methods, rendering real-time monitoring useless. Poor water quality requires a more practical and cost-effective solution. The machine learning classification approach appears promising for rapid detection and prediction of water quality. Machine learning has been used successfully to predict water quality. However, research on machine learning for water quality index (WQI) prediction is generally lacking. Therefore, this research aims to identify the important features for the WQI, which necessitated the classification of numerous indicators. This study develops four machine learning models (Artificial Neural Network, Support Vector Machine, Random Forest, and Naive Bayes) based on the WQI and chemical parameters. The Langat Basin in Selangor dataset from the Department of Environment of Malaysia trains and validates each machine learning model. Several data preprocessing tasks such as data cleaning and feature selection have been conducted on the raw dataset to ensure the quality of the training data. The performance of these machine learning algorithms is further rectified based on the selected features set by several feature selection strategies such as information gain, correlation, and symmetrical uncertainty. Each classifier is then optimized using different tuning parameters to achieve optimum values before comparing the output of the three classifiers against each other. The observational results have shown that the optimized Random Forest classifier with the WQI parameter selected by the information gain feature selection method achieved the highest performance. The experimental results show that the WQI parameters are more relevant in predicting the WQI than the other variables. Consequently, this result shows that parameter oxygen (DO) and biochemical oxygen demand (BOD) are important features for predicting WQI. The proposed model achieved reasonable accuracy with minimal parameters, indicating that it could be used in real-time water quality detection systems.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Predicting Patent Quality Based on Machine Learning Approach
    Erdogan, Zulfiye
    Altuntas, Serkan
    Dereli, Turkay
    [J]. IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2024, 71 : 3144 - 3157
  • [2] An Accurate Approach for Predicting Soil Quality Based on Machine Learning in Drylands
    El Behairy, Radwa A.
    El Arwash, Hasnaa M.
    El Baroudy, Ahmed A.
    Ibrahim, Mahmoud M.
    Mohamed, Elsayed Said
    Rebouh, Nazih Y.
    Shokr, Mohamed S.
    [J]. AGRICULTURE-BASEL, 2024, 14 (04):
  • [3] Machine Learning Algorithms for Predicting the Water Quality Index
    Hussein, Enas E.
    Baloch, Muhammad Yousuf Jat
    Nigar, Anam
    Abualkhair, Hussain F.
    Aldawood, Faisal Khaled
    Tageldin, Elsayed
    [J]. WATER, 2023, 15 (20)
  • [4] Machine learning framework for predicting water quality classification
    Sangwan, Vinita
    Bhardwaj, Rashmi
    [J]. Water Practice and Technology, 2024, 19 (11): : 4499 - 4521
  • [5] Machine Learning Approach for Predicting Air Quality Index
    Kekulanadara, K. M. O. V. K.
    Kumara, B. T. G. S.
    Kuhaneswaran, Banujan
    [J]. 2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
  • [6] Predicting Reuse Interval for Optimized Web Caching: An LSTM-Based Machine Learning Approach
    Li, Pengcheng
    Guo, Yixin
    Gu, Yongbin
    [J]. SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [7] Predicting thermal desorption efficiency of PAHs in contaminated sites based on an optimized machine learning approach
    Zhang, Shuai
    Wang, Shuyuan
    Zhao, Jiating
    Zhu, Lizhong
    [J]. ENVIRONMENTAL POLLUTION, 2024, 346
  • [8] Development of entropy-river water quality index for predicting water quality classification through machine learning approach
    Deepak Gupta
    Virendra Kumar Mishra
    [J]. Stochastic Environmental Research and Risk Assessment, 2023, 37 : 4249 - 4271
  • [9] Development of entropy-river water quality index for predicting water quality classification through machine learning approach
    Gupta, Deepak
    Mishra, Virendra Kumar
    [J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2023, 37 (11) : 4249 - 4271
  • [10] Predicting Aquaculture Water Quality Using Machine Learning Approaches
    Li, Tingting
    Lu, Jian
    Wu, Jun
    Zhang, Zhenhua
    Chen, Liwei
    [J]. WATER, 2022, 14 (18)