当前位置: X-MOL 学术J. Agric. Biol. Environ. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study
Journal of Agricultural, Biological and Environmental Statistics ( IF 1.4 ) Pub Date : 2024-02-08 , DOI: 10.1007/s13253-024-00602-4
Arnab Hazra , Pratik Nag , Rishikesh Yadav , Ying Sun

Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional R functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.



中文翻译:

探索统计和深度学习方法对大型空间数据集的有效性:案例研究

由于计算和存储成本高昂,日益庞大和复杂的空间数据集带来了巨大的推理挑战。我们的研究受到 2023 年 KAUST 大型空间数据集竞赛的推动,该竞赛要求参与者估计空间协方差相关参数并预测测试地点的值以及不确定性估计。我们通过交叉验证比较了各种统计和深度学习方法,最终选择了 Vecchia 近似技术进行模型拟合。为了克服RGpGp中的限制,该包缺乏对拟合零均值高斯过程和直接不确定性估计的支持(这两个都是竞赛所必需的),我们开发了额外的R函数。此外,我们还针对估计器的偏斜采样分布实现了某些基于子采样的近似和参数平滑。我们的 DesiBoys 团队在四场分赛中的两场比赛中获得第一名,在另外两场比赛中获得第二名,验证了我们提出的策略的有效性。此外,我们将评估扩展到一个大型真实空间卫星衍生的总可降水量数据集,其中我们使用多种诊断方法比较了不同模型的预测性能。

更新日期:2024-02-09
down
wechat
bug