当前位置: X-MOL 学术Phys. Chem. Earth Parts A/B/C › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction enhancement for surface water sodium adsorption ratio using limited inputs: Implementation of hybridized stacked ensemble model with feature selection algorithm
Physics and Chemistry of the Earth, Parts A/B/C ( IF 3.7 ) Pub Date : 2024-01-11 , DOI: 10.1016/j.pce.2024.103561
Meysam Salarijazi , Iman Ahmadianfar , Zaher Mundher Yaseen

The Sodium Adsorption Ratio (SAR) is a widely used variable in water quality research, particularly in agriculture and environmental studies. In many cases, the key variables required for SAR calculation, namely Na+, Mg+2, and Ca+2, are not available. Consequently, the potential to calculate SAR using a limited number of water quality variables becomes critically important. The study implemented the Multilayer Perceptron Neural Network (MLPNN), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN) models at level-0 for prediction purposes, along with the Boruta model for variable selection. A stacked ensemble learning model at level-1 enhanced the prediction accuracy. The discharge and water quality dataset from the Zarrin-Gol River in northern Iran was utilized to implement the modeling procedure. Results obtained from the variable selection process using the Boruta model revealed that using a limited number of water quality variables can effectively predict SAR even without the principal variables. Further investigation of the input combinations for the level-0 models demonstrated that, for the MLPNN, KNN, and SVR models, 4, 3, and 1 input variables, respectively, yielded optimal predictions. Among the level-0 models, the MLPNN model exhibited the highest accuracy, with RMSE = 0.54, MBE = 0.26, MAE = 0.44, R = 0.84, IA = 0.67, and KGE = 0.79. Implementing the stacked ensemble learning model at level-1 significantly improved the SAR prediction compared to the level-0 models. The ensemble-NN model yielded the best performance in estimating SAR within the range of recorded data, with RMSE = 0.53, MBE = 0.29, MAE = 0.41, R = 0.87, IA = 0.70, and KGE = 0.82. Residual analysis further confirmed the superior predictive capability of the level-1 models compared to the level-0 models. The generalized-logistic probability distribution function is used to estimate the extreme values data. The Ensemble-KNN model best predicted extreme values data, with RMSE = 0.69, MBE = −0.61, MAE = 0.61, R = 0.61, IA = 0.26, and KGE = 0.37. The findings underscore the substantial advancements achieved through stacked ensemble methods in enhancing the modeling of SAR across various aspects, including total data, extreme values, and models' residuals.



中文翻译:

使用有限输入增强地表水钠吸附率的预测:利用特征选择算法实现混合堆叠集成模型

钠吸附比 (SAR) 是水质研究中广泛使用的变量,特别是在农业和环境研究中。在许多情况下,SAR 计算所需的关键变量,即 Na+、Mg+2 和 Ca+2 并不可用。因此,使用有限数量的水质变量计算 SAR 的潜力变得至关重要。该研究在 0 级实现了多层感知器神经网络 (MLPNN)、支持向量回归 (SVR) 和 K 最近邻 (KNN) 模型以用于预测目的,以及用于变量选择的 Boruta 模型。1 级堆叠集成学习模型提高了预测准确性。利用伊朗北部扎林戈尔河的流量和水质数据集来实施建模程序。使用 Boruta 模型的变量选择过程获得的结果表明,即使没有主变量,使用有限数量的水质变量也可以有效地预测 SAR。对 0 级模型的输入组合的进一步研究表明,对于 MLPNN、KNN 和 SVR 模型,分别使用 4、3 和 1 个输入变量产生最佳预测。在0级模型中,MLPNN模型的准确率最高,RMSE = 0.54,MBE = 0.26,MAE = 0.44,R = 0.84,IA = 0.67,KGE = 0.79。与 0 级模型相比,在 1 级实现堆叠集成学习模型显着改善了 SAR 预测。集成神经网络模型在估计记录数据范围内的 SAR 方面表现最佳,RMSE = 0.53、MBE = 0.29、MAE = 0.41、R = 0.87、IA = 0.70 和 KGE = 0.82。残差分析进一步证实了 1 级模型相对 0 级模型的优越预测能力。广义逻辑概率分布函数用于估计极值数据。Ensemble-KNN 模型对极值数据的预测效果最好,RMSE = 0.69,MBE = -0.61,MAE = 0.61,R = 0.61,IA = 0.26,KGE = 0.37。研究结果强调了通过堆叠集成方法在增强 SAR 建模的各个方面(包括总数据、极值和模型残差)方面取得的重大进展。

更新日期:2024-01-11
down
wechat
bug