Reconstructing missing data by comparing interpolation techniques: Applications for long-term water quality data,Limnology and Oceanography: Methods

当前位置： X-MOL 学术 › Limnol. Oceanogr. Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reconstructing missing data by comparing interpolation techniques: Applications for long-term water quality data
Limnology and Oceanography: Methods ( IF 2.7 ) Pub Date : 2023-05-30 , DOI: 10.1002/lom3.10556
Danelle M. Larson ₁ , Wako Bungula ₂ , Amber Lee ₃ , Alaina Stockdill ₃ , Casey McKean ₃ , Frederick "Forrest" Miller ₃ , Killian Davis ₃ , Richard A. Erickson ₁ , Enrika Hlavacek ₁

Affiliation

Missing data are typical yet must be addressed for proper inferences or expanding datasets to guide our limnological understanding and management of aquatic systems. Interpolation methods (i.e., estimating missing values using known values within the dataset) can alleviate data gaps and common problems. We compared seven popular interpolation methods for predicting substantial missingness in a long-term water quality dataset from the Upper Mississippi River, U.S.A. The dataset included 80,000 sampling sites collected over 30 yr that had substantial missingness for total nitrogen (TN), total phosphorus (TP), and water velocity. For all three interpolated water quality variables, random forests had very high prediction accuracy and outperformed the methods of ordinary kriging, polynomial regressions, regression trees, and inverse distance weighting. TP had a mean absolute error (MAE) of 0.03 mg (L-TP)⁻¹, TN had a MAE of 0.39 mg (L-TN)⁻¹, and water velocity had a MAE of 0.10 m s⁻¹. The random forests' error rates were mapped and showed low spatiotemporal variability across the riverscape, indicating high model performance across many habitat types and large spatial scales. In the current era of “big data,” interpolation becomes an imperative step prior to ecological analyses yet remains unfamiliar and underutilized. Our research briefly describes the importance of addressing missingness and provides a roadmap to conduct model intercomparisons of other big datasets. We also share adaptable data analysis scripts, which allows others to readily conduct interpolation comparisons for many limnology applications and contexts.

中文翻译：

通过比较插值技术重建缺失数据：长期水质数据的应用

缺失数据是典型的情况，但必须加以解决，以进行适当的推论或扩展数据集，以指导我们对水生系统的湖泊学理解和管理。插值方法（即使用数据集中的已知值估计缺失值）可以缓解数据差距和常见问题。我们比较了七种流行的插值方法，用于预测美国密西西比河上游长期水质数据集中的大量缺失。该数据集包括 30 年来收集的 80,000 个采样点，其中总氮 (TN)、总磷 (TP) 存在大量缺失。）和水流速度。对于所有三个插值水质变量，随机森林都具有非常高的预测精度，并且优于普通克里金法、多项式回归、回归树和反距离加权方法。^-1，TN 的 MAE 为 0.39 mg (L-TN) ^-1，水速的 MAE 为 0.10 m s ^-1。随机森林的错误率被绘制出来，并显示整个河流景观的时空变异性较低，这表明模型在许多栖息地类型和大空间尺度上表现良好。在当前的“大数据”时代，插值法成为生态分析之前的必要步骤，但仍然不熟悉且未得到充分利用。我们的研究简要描述了解决缺失问题的重要性，并提供了与其他大数据集进行模型比较的路线图。我们还共享适应性强的数据分析脚本，使其他人能够轻松地对许多湖泊学应用程序和环境进行插值比较。

更新日期：2023-05-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>