Statistical inference methods for n-dimensional hypervolumes: Applications to niches and functional diversity,Methods in Ecology and Evolution

当前位置： X-MOL 学术 › Methods Ecol. Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Statistical inference methods for n-dimensional hypervolumes: Applications to niches and functional diversity
Methods in Ecology and Evolution ( IF 6.6 ) Pub Date : 2024-03-01 , DOI: 10.1111/2041-210x.14310
Daniel Chen _{1,

2} , Alex Laini ₃ , Benjamin Wong Blonder ₂

Affiliation

1 INTRODUCTION

Hutchinson (1957) introduced the hypervolume concept, which describes the requirements of species along multiple axes, that is their niche, or the functional diversity of an assemblage. The concept, reviewed in Blonder (2018), has seen wide use. An n-dimensional hypervolume is a shape defined within multiple continuously valued dimensions, for which a distance metric exists. Hypervolumes allow geometric interpretations of high-dimensional datasets and can be used to produce statistics for dataset shapes, volumes and overlaps. Numerous approaches have been proposed to estimate hypervolumes and their properties, for example the R packages hypervolume (Blonder et al., 2014, 2018), TPD (Carmona et al., 2019), nicheROVER (Swanson et al., 2015), dynRB (Junker et al., 2016), BAT (Mammola & Cardoso, 2020) and hyperoverlap (Brown et al., 2020).

Most of these approaches are unified by their production of descriptive statistics without considering sampling processes. This has often precluded the calculation of inferential statistics. Moving to statistical inference could increase the reliability of conclusions derived from these methods and expand the scope of hypotheses tested.

Some existing algorithms do consider inference in limited ways. In the case of Gaussian kernel density estimates, the assumptions are that the data are samples from random variables, and are independent and identically distributed (Blonder et al., 2018; Carmona et al., 2019). Other approaches have developed statistics that are robust to outliers (Brown et al., 2020; Junker et al., 2016). Resampling has been used to determine if hypervolumes differ from null expectations (Díaz et al., 2016; Lamanna et al., 2014). However, hypothesis testing (Zhang, 2020) has been rarer. However, many of these studies were implemented using non-public code, limiting standardization and re-use by other investigators.

Here we develop statistical inference methods for the hypervolume R package (Blonder et al., 2014, 2018). We (1) provide a set of R functions that allow calculation of confidence intervals for descriptive statistics under different sampling processes and sample sizes, estimation of bias, and hypothesis tests. We (2) show empirical analyses to illustrate usage on trait and niche datasets. We (3) describe other updates (Box 1) and provide validation simulations in the Supporting Information.

BOX 1. Additional updates to the R package

In estimate_bandwidth, the definition of the Silverman bandwidth estimator for box and Gaussian kernel density estimation has changed from $1.06 \times m^{- 1 / 5} \times σ (X)$ to ${(\frac{4}{(n + 2)})}^{1 / (n + 4)} \times m^{- 1 / (n + 4)} \times σ (X)$ (Silverman, 1986). This change aligns with the best multivariate definition of this estimator. The original univariate estimator is also still available. Additionally, for all bandwidth estimates, an attribute method is now set so that bandwidth vectors can be recalculated using the same methods when resampling.

The package examples now use the penguins morphology data from Antarctica (Horst et al., 2022). We no longer use the iris dataset, because it was first published in the Annals of Eugenics (Fisher, 1936), and its primary proponent, the statistician R.A. Fisher, was a eugenicist (though it was collected for unrelated reasons by Edgar Anderson).

中文翻译：

n 维超体积的统计推断方法：在利基和功能多样性中的应用

1 简介

Hutchinson ( 1957 ) 引入了超体积概念，该概念描述了物种沿多个轴的需求，即它们的生态位或组合的功能多样性。这个概念在 Blonder ( 2018 ) 中得到了回顾，并得到了广泛的应用。 n维超体积是在多个连续值维度内定义的形状，存在距离度量。超体积允许对高维数据集进行几何解释，并可用于生成数据集形状、体积和重叠的统计数据。人们提出了许多方法来估计超体积及其属性，例如 R 包超体积（Blonder 等人， 2014 年、2018 年）、TPD（Carmona 等人， 2019 年）、nicheROVER（Swanson 等人， 2015 年）、dynRB（Junker 等人， 2016）、BAT（Mammola 和 Cardoso， 2020）和超重叠（Brown 等人， 2020）。

这些方法中的大多数通过描述性统计数据的生成来统一，而不考虑抽样过程。这常常妨碍了推论统计的计算。转向统计推断可以提高从这些方法得出的结论的可靠性，并扩大假设检验的范围。

一些现有算法确实以有限的方式考虑推理。在高斯核密度估计的情况下，假设数据是来自随机变量的样本，并且是独立且同分布的（Blonder 等人， 2018；Carmona 等人， 2019）。其他方法已经开发出对异常值稳健的统计数据（Brown 等人， 2020；Junker 等人， 2016）。重采样已用于确定超体积是否与零期望不同（Díaz 等人， 2016；Lamanna 等人， 2014）。然而，假设检验（Zhang， 2020）却很少见。然而，其中许多研究是使用非公开代码实施的，限制了标准化和其他研究人员的重复使用。

在这里，我们为超体积R 包开发统计推断方法（Blonder 等人， 2014 年，2018 年）。我们 (1) 提供了一组 R 函数，允许计算不同采样过程和样本大小下的描述性统计的置信区间、偏差估计和假设检验。我们（2）展示了实证分析来说明特征和利基数据集的使用。我们 (3) 描述其他更新（框 1）并在支持信息中提供验证模拟。

框 1. R 包的其他更新

在estimate_bandwidth中，用于框和高斯核密度估计的Silverman带宽估计器的定义已更改为 $1.06 \times 米^{- 1 / 5} \times σ (X)$ 到 ${(\frac{4}{(n + 2)})}^{1 / (n + 4)} \times 米^{- 1 / (n + 4)} \times σ (X)$ （西尔弗曼， 1986）。此更改与此估计量的最佳多元定义一致。原始的单变量估计器仍然可用。此外，对于所有带宽估计，现在设置了一种属性方法，以便在重采样时可以使用相同的方法重新计算带宽向量。