当前位置: X-MOL 学术Ecol Modell › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing the performance of global, geographically weighted and ecologically weighted species distribution models for Scottish wildcats using GLM and Random Forest predictive modeling
Ecological Modelling ( IF 3.1 ) Pub Date : 2024-04-08 , DOI: 10.1016/j.ecolmodel.2024.110691
S.A. Cushman , K. Kilshaw , R.D. Campbell , Z. Kaszta , M. Gaywood , D.W. Macdonald

Species distribution modeling has emerged as a foundational method to predict occurrence and suitability of species in relation to environmental variables to advance ecological understanding and guide conservation planning. Recent research, however, has shown that species-environmental relationships and habitat model predictions are often nonstationary in space, time and ecological context. This calls into question modeling approaches that assume a global, stationary ecological realized niche and use predictive modeling to describe it. This paper explores this issue by comparing the performance of predictive models for wildcat hybrid occurrence based on (1) global pooled data across individuals, (2) geographically weighted aggregation of individual models, (3) ecologically weighted aggregation of individual models, and (4) combinations of global, geographical and ecological weighting. Our study system included GPS telemetry data from 14 individual wildcat hybrids across Scotland. We developed predictive models both using Generalized Linear Models (GLM) and Random Forest machine learning to compare the performance of these differing algorithms and how they compare in stationary and nonstationary analyses. We validated the predicted models in four different ways. First, we used independent hold-out data from the 14 collared wildcat hybrids. Second, we used data from 8 additional GPS collared wildcat hybrids from a previous study that were not included in the training sample. Third, we used sightings data sent in by the public and researchers and validated by expert opinion. Fourth, we used data collected by camera trap surveys between 2012 – 2021 from various sources to produce a combined camera trap dataset showing where wildcats and wildcat hybrids had been detected. Our results show that validation using hold-out data from the individuals used to train the model provides highly biased assessment of true model performance in other locations, with Random Forest in particular appearing to perform exceptionally (and inaccurately) well when validated by data from the same individuals used to train the models. Very different results were obtained when the models were validated using independent data from the three other sources. Each of these three independent validation data sets gave a different result in terms of the best overall model. The average of independent validation across these three validation datasets suggested that the best overall model produced for potential wildcat occurrence and habitat suitability was obtained by an ensemble average of the global Generalized Linear Model (GLM) and Random Forest models with the ecologically weighted GLM and Random Forest models. This suggests that the debate over whether which of GLM vs machine learning approaches is superior or whether global vs aggregated nonstationary modeling is superior may be a false choice. The results presented here show that the best prediction applies a combination of all of these approaches in an ensemble modeling framework.

中文翻译:

使用 GLM 和随机森林预测模型比较苏格兰野猫的全球、地理加权和生态加权物种分布模型的性能

物种分布模型已成为预测物种与环境变量相关的出现和适宜性的基本方法,以促进生态理解和指导保护规划。然而,最近的研究表明,物种与环境的关系和栖息地模型预测在空间、时间和生态环境中通常是非平稳的。这对假设全球固定生态位并使用预测模型来描述它的建模方法提出了质疑。本文通过比较野猫杂交发生预测模型的性能来探讨这个问题,该模型基于(1)个体的全局汇总数据,(2)个体模型的地理加权聚合,(3)个体模型的生态加权聚合,以及(4) )全球、地理和生态权重的组合。我们的研究系统包括来自苏格兰各地 14 只野猫杂交个体的 GPS 遥测数据。我们使用广义线性模型 (GLM) 和随机森林机器学习开发了预测模型,以比较这些不同算法的性能以及它们在平稳和非平稳分析中的比较。我们以四种不同的方式验证了预测模型。首先,我们使用了来自 14 只带项圈野猫杂交种的独立保留数据。其次,我们使用了先前研究中另外 8 只带有 GPS 项圈的野猫杂交种的数据,这些野猫未包含在训练样本中。第三,我们使用了公众和研究人员发送的目击数据并经过专家意见验证。第四,我们使用 2012 年至 2021 年期间从各种来源进行的相机陷阱调查收集的数据来生成组合相机陷阱数据集,显示在何处检测到野猫和野猫杂交种。我们的结果表明,使用来自用于训练模型的个人的保留数据进行的验证对其他位置的真实模型性能提供了高度偏差的评估,特别是随机森林在通过来自用于训练模型的同一个人。当使用来自其他三个来源的独立数据验证模型时,得到了截然不同的结果。这三个独立的验证数据集各自在最佳整体模型方面给出了不同的结果。这三个验证数据集的独立验证平均值表明,为潜在野猫发生和栖息地适宜性生成的最佳总体模型是通过全局广义线性模型 (GLM) 和随机森林模型(具有生态加权 GLM 和随机模型)的整体平均值获得的森林模型。这表明关于 GLM 与机器学习方法哪种更优或者全局模型与聚合非平稳模型是否更优的争论可能是错误的选择。这里给出的结果表明,最好的预测是在集成建模框架中结合使用所有这些方法。
更新日期:2024-04-08
down
wechat
bug