Comparison of regression-based and machine learning techniques to explain alpha diversity of fish communities in streams of central and eastern India

被引:5
|
作者
Mondal, Rubina [1 ]
Bhat, Anuradha [1 ]
机构
[1] Indian Inst Sci Educ & Res Kolkata, Dept Biol Sci, Mohanpur 741246, W Bengal, India
关键词
Freshwater fish; Artificial neural network; Linear mixed models; Multivariate adaptive regression splines; Generalized additive models; ARTIFICIAL NEURAL-NETWORK; FRESH-WATER FISH; GENERALIZED ADDITIVE-MODELS; SPECIES DISTRIBUTION; WEST-BENGAL; CONSERVATION STRATEGIES; FUNCTIONAL DIVERSITY; CLIMATE-CHANGE; FLOW REGIMES; RIVER;
D O I
10.1016/j.ecolind.2021.107922
中图分类号
X176 [生物多样性保护];
学科分类号
090705 ;
摘要
Over the past several decades, ecologists have been striving to develop models that accurately describe specieshabitat relationships across ecological communities. Statistical models that explain ecological dynamics need to consider the nuances of the complex interactions between communities and ecological factors. Here, we used multiple linear mixed models (LMM), generalized additive models (GAM), multivariate adaptive regression splines (MARS), and artificial neural networks (ANN) to model species richness and diversity of freshwater fishes in eastern and central India. The models were based on fish abundance and associated ecological data over three years across the study regions. We developed global models using all predictors after removing highly correlated variables (Pearson's r > 0.7). Results revealed conductivity, water temperature, and water velocity as the most important predictive factors of both species richness and diversity. We, then, built two subsets of selected factors to build predictive models for diversity and richness- one variable set containing common significant factors as revealed from the four different modeling methods used and the second, using an automatic feature selection technique. Amongst the modeling methods used in our study, ANN was found to create the best fit models for explaining nonlinearities between response variables and predictors. The importance of variable selection is highlighted, given that subset 1 (common consensual factors) creates more homogeneity in predictions compared to using subset 2 (automated feature selection). Contrary to similar studies in recent years, which show machine learning (ML) methods to typically outperform conventional methods, our results revealed that ANN performed at par with other methods in terms of predictive power. Our findings underline the need for a judicious choice of modeling techniques based on the availability of the data and the ecological communities being studied.
引用
收藏
页数:12
相关论文
共 17 条
  • [1] Spatial prediction of demersal fish diversity in the Baltic Sea: comparison of machine learning and regression-based techniques
    Smolinski, Szymon
    Radtke, Krzysztof
    ICES JOURNAL OF MARINE SCIENCE, 2017, 74 (01) : 102 - 111
  • [2] Machine learning and regression-based techniques for predicting sprinkler irrigation's wind drift and evaporation losses
    Mattar, Mohamed A.
    Roy, Dilip Kumar
    Al-Ghobari, Hussein M.
    Dewidar, Ahmed Z.
    AGRICULTURAL WATER MANAGEMENT, 2022, 265
  • [3] Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure
    Austin, David E.
    Lee, Douglas S.
    Wang, Chloe X.
    Ma, Shihao
    Wang, Xuesong
    Porter, Joan
    Wang, Bo
    INTERNATIONAL JOURNAL OF CARDIOLOGY, 2022, 365 : 78 - 84
  • [4] A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population
    Chowdhury, Mohammad Ziaul Islam
    Leung, Alexander A. A.
    Walker, Robin L. L.
    Sikdar, Khokan C. C.
    O'Beirne, Maeve
    Quan, Hude
    Turin, Tanvir C. C.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [5] A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population
    Mohammad Ziaul Islam Chowdhury
    Alexander A. Leung
    Robin L. Walker
    Khokan C. Sikdar
    Maeve O’Beirne
    Hude Quan
    Tanvir C. Turin
    Scientific Reports, 13
  • [6] Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model
    Belsti, Yitayeh
    Moran, Lisa
    Du, Lan
    Mousa, Aya
    De Silva, Kushan
    Enticott, Joanne
    Teede, Helena
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 179
  • [7] A comparison of machine learning- and regression-based models for predicting ductility ratio of RC beam-column joints
    Dabiri, Hamed
    Rahimzadeh, Khashayar
    Kheyroddin, Ali
    STRUCTURES, 2022, 37 : 69 - 81
  • [8] Soft computing techniques for predicting the properties of raw rice husk concrete bricks using regression-based machine learning approaches
    Ganasen, Nakkeeran
    Krishnaraj, L.
    Onyelowe, Kennedy C.
    Alaneme, George Uwadiegwu
    Otu, Obeten Nicholas
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [9] Soft computing techniques for predicting the properties of raw rice husk concrete bricks using regression-based machine learning approaches
    Nakkeeran Ganasen
    L. Krishnaraj
    Kennedy C. Onyelowe
    George Uwadiegwu Alaneme
    Obeten Nicholas Otu
    Scientific Reports, 13 (1)
  • [10] Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
    Feng, Cindy
    Kephart, George
    Juarez-Colunga, Elizabeth
    BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)