当前位置: X-MOL 学术Scand. J. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comments on Divergence vs. Decision P-values
Scandinavian Journal of Statistics ( IF 1 ) Pub Date : 2023-04-12 , DOI: 10.1111/sjos.12647
Paul W. Vos 1
Affiliation  

The distinction between the two uses of p-values described by Professor Greenland is related to two distinct interpretations of frequentist probability—that is, probability used to describe a random event. I will illustrate with a simple example.

In the North Carolina Pick-4 lottery, 10 ping pong balls labeled with distinct digits from I 9 = 0 , 1 , , 9 $$ {I}_9=\left\{0,1,\dots, 9\right\} $$ are mixed in a clear container and opening a door allows a single ball to be selected. Prior to opening the door, blown air mixes the balls making equally likely selection of each ball plausible. This is repeated with three identical containers to obtain the remaining three digits. If a winning ticket is defined as one where the sum of the four digits exceeds 28, the state can charge $5 for a ticket with a $100 prize and expect a profit. There are 330 of 1 0 4 $$ 1{0}^4 $$ possible outcomes where the sum exceeds 28 so the expected value is 0 . 033 × $ 100 = $ 3 . 30 $$ 0.033\times \$100=\$3.30 $$ . This calculation requires no repeated sampling but it is natural for the state to interpret this value in the long run. For an individual ticket holder, all that is required is that each ball is given an equal chance to be selected for the drawing associated with his ticket. The ticket holder does not need to imagine a long sequence of draws just as a cancer patient does not need to consider a long sequence of 5-year periods to understand a 30% 5-year survival. Using terminology from Vos and Holbert (2022), the scope for the ticket holder is specific while that of the state is generic.

The uniform distribution on 4-tuples I 9 4 = I 9 × I 9 × I 9 × I 9 $$ {I}_9^4={I}_9\times {I}_9\times {I}_9\times {I}_9 $$ provides a model for repeated draws of the Pick-4 lottery, that is, of the data generation process. For most inference applications, the distribution of an unknown population can be modeled rather than the process that generated the data. We modify this example to consider inference.

We are told the sum of a single lottery draw and we are to infer whether the draw came from the NC lottery or lottery A that also has four containers but each contains 8 balls with labels from I 7 = 0 , 1 , , 7 $$ {I}_7=\left\{0,1,\dots, 7\right\} $$ . The sum of the digits is 29 but no other information is given. A reduction-to-contradiction argument establishes that the result came from the NC lottery. Premise: lottery A produced our data; every possible sum from lottery A belongs to the set 0 , 1 , , 28 $$ \left\{0,1,\dots, 28\right\} $$ ; 29 is not in this set; conclusion: the contradiction means it is impossible that the premise is true.

The deductive argument used for a sum of 29 does not work if the sum is 28. Logical certainty is no longer possible but sums of 28 or less still provide evidence, to varying degrees, regarding which lottery was used. A reduction-to-incredibility argument modeled on the above deduction can be used. Premise: lottery A produced the sum of 28; of the 8 4 $$ {8}^4 $$ possible 4-tuples only one produces a sum as large as 28; each 4-tuple had an equal chance of being selected; the probability of a sum of 28 is 1 / 8 4 < 0 . 00025 $$ 1/{8}^4<0.00025 $$ ; conclusion: the unlikely observation makes it doubtful that the premise is true.

An important distinction from the deductive argument is the second step regarding all possible outcomes being equally likely. Without this we can say 28 is in the upper 0.025 percentile of the sampling distribution of 4-tuples ordered by their sum, but we cannot say the probability is less than 0.00025. In contrast, the deductive argument is valid even if the balls are hand-picked; randomization plays no role. In the conclusion of the inductive argument, the word “unlikely” refers to the stochastic probability of obtaining a sum of 28 while “doubtful” describes a degree-of-belief regarding the lottery that was used. While these are related quantities—observations that are less likely to have occurred would create greater doubt—failure to understand these as distinct can lead to confusion, especially when the numeric value of the stochastic probability, 1 / 8 4 $$ 1/{8}^4 $$ , is used to assign a numeric measure of one's doubt in the absence of any other information regarding the two lotteries.

The p-value, 1 / 8 4 $$ 1/{8}^4 $$ , is obtained from a measurable function and so, by definition, is a random variable. All p-values are measurable functions and so all p-values are random variables. However, the adjective random describes only one use for this measurable function, namely to model a random process. Random variables also provide distributions that are relevant to the inference question. Although randomization plays no role in the definition of these distributions, their relevance to inference does depend on how the observed sample was obtained from the population. As a random process, 1 / 8 4 $$ 1/{8}^4 $$ is the limiting relative frequency of draws from lottery A that result in a sum of 28. Using the random process interpretation means we have to create a hypothetical process by imagining repeated draws from lottery A when, in fact, the actual sample may have come from the NC lottery. That is, the hypothetical samples do not come from the population, as the actual sample did, but from a model for the population. This distinction between population and model for the population is especially important when the model is infinite.

A more realistic example is inference for a dichotomous attribute of a population, say, high blood pressure (BP). The population distribution is the ordered pair of relative frequencies associated with the two attributes, ( 1 p pop , p pop ) $$ \left(1-{p}_{\mathrm{pop}},{p}_{\mathrm{pop}}\right) $$ where p pop $$ {p}_{\mathrm{pop}} $$ is the unknown proportion with high BP. The Bernoulli family of distributions, ( 1 p , p ) , 0 < p < 1 $$ \left\{\left(1-p,p\right),0<p<1\right\} $$ , provide models for the population distribution. If the support for the Bernoulli family is 0 , 1 $$ \left\{0,1\right\} $$ , then the n $$ n $$ -fold convolution of ( 1 p , p ) $$ \left(1-p,p\right) $$ is the binomial distribution B ( n , p ) $$ B\left(n,p\right) $$ placing mass n y p y ( 1 p ) n y $$ \left(\genfrac{}{}{0ex}{}{n}{y}\right){p}^y{\left(1-p\right)}^{n-y} $$ on sum y $$ y $$ . For rational p $$ p $$ the binomial distribution can be obtained by considering all possible samples (with replacement) of size n $$ n $$ and calculating the relative frequency for each sum.1 When each sample is equally likely these relative frequencies are probabilities. Convolution extends this relationship between Bernoulli and binomial distributions to the case where p $$ p $$ is any real number in the unit interval.

The key here is that Bernoulli distributions and their relationship with the binomial family are part of mathematics and don't involve randomization. If B ( 1 , p ) $$ B\left(1,p\right) $$ provides a good model for the population, then so does B ( n , p ) $$ B\left(n,p\right) $$ for the sampling distribution of the sum. However, if the observation is in the extreme tail of B ( n , p $$ B\Big(n,p $$ ) this does not necessarily mean the observation should be considered unlikely. As we saw in the lottery example, a sum of 28 is in the extreme tail of the sampling distribution but it is the fact that every 4-tuple had an equal chance of selection that makes the tail area equal to the probability. The same is true for the binomial example, and for inference in general.

Now that two models are replaced with a family of models, inference involves a continuum of null hypotheses and so, a continuum of reduction-to-incredibility arguments. The p-value associated with each argument is a tail area that describes how extreme the sample is as a point in the sampling distribution obtained from the premise. The conclusion requires a probability, and a single random sample from the population justifies that each of the tail areas in this continuum is a probability.

Interpreting a p-value as describing a random process, as one is inclined to do when it is labeled a random variable, is problematic when we have a continuum of hypotheses. The hypothetical samples come from a model for the population, and when this model is infinite it is not clear what it means to give every sample an equal chance of being selected. I recognize there are other interpretations for the hypothetical random process associated with the p-value. In fact, it is the plethora of such interpretations that can make p-values confusing.

My final comment concerns geometry. Generalized estimators which are related to Godambe's (1960) estimating functions are useful for studying inferential properties of different methods for defining p-values. These generalized estimators form a vector bundle over a statistical manifold and their statistical properties are described by the relationship with the tangent bundle. Details are in Vos (2022).



中文翻译:

对分歧与决策 P 值的评论

Greenland 教授描述的p值的两种用途之间的区别与频率概率(即用于描述随机事件的概率)的两种不同解释有关。我将用一个简单的例子来说明。

在北卡罗来纳州的 Pick-4 彩票中,10 个乒乓球上标有不同的数字 9 = 0 , 1 , …… , 9 $$ {I}_9=\left\{0,1,\dots, 9\right\} $$ 混合在一个透明的容器中,打开门可以选择单个球。在打开门之前,吹出的空气将球混合,使得每个球被选择的可能性均等。用三个相同的容器重复此操作以获得剩余的三位数字。如果中奖彩票被定义为四位数字之和超过 28 的彩票,则州政府可以对奖金为 100 美元的彩票收取 5 美元,并期望获得利润。有 330 个 1 0 4 $$ 1{0}^4 $$ 总和超过 28 的可能结果,因此预期值为 0 033 × $ 100 = $ 3 30 $$ 0.033\次\$100=\$3.30 $$ 。这种计算不需要重复采样,但从长远来看,国家解释这个值是很自然的。对于个人门票持有者来说,所需要的只是每个球都有平等的机会被选择参加与其门票相关的抽奖。持票人不需要想象一长串的抽奖,就像癌症患者不需要考虑一长串的 5 年周期来理解 30% 的 5 年生存率一样。使用 Vos 和 Holbert ( 2022 ) 的术语,持票人的范围是特定的,而州的范围是通用的。

4 元组上的均匀分布 9 4 = 9 × 9 × 9 × 9 $$ {I}_9^4={I}_9\times {I}_9\times {I}_9\times {I}_9 $$ 提供了 Pick-4 彩票重复抽奖的模型,即数据生成过程。对于大多数推理应用程序,可以对未知总体的分布进行建模,而不是对生成数据的过程进行建模。我们修改这个例子以考虑推理。

我们被告知单次彩票抽奖的总和,我们要推断抽奖是否来自 NC 彩票或彩票 A,后者也有四个容器,但每个容器包含 8 个带有标签的球 7 = 0 , 1 , …… , 7 $$ {I}_7=\left\{0,1,\dots, 7\right\} $$ 。数字之和为 29,但没有给出其他信息。矛盾还原论证表明结果来自 NC 彩票。前提:彩票A产生了我们的数据;彩票 A 中每个可能的总和都属于该集合 0 , 1 , …… , 28 $$ \左\{0,1,\点, 28\右\} $$ ; 29 不在这个集合中;结论:矛盾意味着前提不可能为真。

如果总和为 28,那么用于总和 29 的演绎论证就不起作用。逻辑确定性不再可能,但 28 或更少的总和仍然在不同程度上提供了关于使用哪种彩票的证据。可以使用以上述推论为模型的还原到难以置信的论证。前提:彩票A开出的总和为28;的 8 4 $$ {8}^4 $$ 可能的 4 元组中只有一个元组的和等于 28;每个4元组被选择的机会均等;总和为 28 的概率是 1 / 8 4 < 0 00025 $$ 1/{8}^4<0.00025 $$ ; 结论:不太可能的观察结果让人怀疑前提的真实性。

与演绎论证的一个重要区别是第二步认为所有可能的结果都是同等可能的。如果没有这个,我们可以说 28 位于按总和排序的 4 元组抽样分布的上 0.025 个百分位中,但我们不能说概率小于 0.00025。相反,即使球是手工挑选的,演绎论证也是有效的。随机化不起作用。在归纳论证的结论中,“不太可能”一词指的是获得总和为 28 的随机概率,而“可疑”则描述了对所使用的彩票的置信程度。虽然这些是相关的数量(不太可能发生的观察结果会产生更大的疑问),但如果不理解这些不同的数量,可能会导致混乱, 1 / 8 4 $$ 1/{8}^4 $$ ,用于在没有关于这两种彩票的任何其他信息的情况下指定一个人的怀疑的数字度量。

p 1 / 8 4 $$ 1/{8}^4 $$ ,是从可测量函数获得的,因此根据定义,它是一个随机变量。所有p值都是可测量函数,因此所有p值都是随机变量。然而,形容词“随机”仅描述了该可测量函数的一种用途,即对随机过程进行建模。随机变量还提供与推理问题相关的分布。尽管随机化在这些分布的定义中不起任何作用,但它们与推论的相关性确实取决于如何从总体中获得观察到的样本。作为一个随机过程, 1 / 8 4 $$ 1/{8}^4 $$ 是从彩票 A 中抽取的相对频率,其总和为 28。使用随机过程解释意味着我们必须通过想象从彩票 A 中重复抽取来创建一个假设过程,而事实上,实际样本可能来自北卡罗来纳州彩票。也就是说,假设样本并不像实际样本那样来自总体,而是来自总体模型。当模型无限时,总体和总体模型之间的区别尤其重要。

一个更现实的例子是对人群的二分属性的推断,例如高血压 (BP)。总体分布是与两个属性相关的有序相对频率对, 1 - p 流行音乐 , p 流行音乐 $$ \left(1-{p}_{\mathrm{pop}},{p}_{\mathrm{pop}}\right) $$ 在哪里 p 流行音乐 $$ {p}_{\mathrm{pop}} $$ 是高血压的未知比例。伯努利分布族, 1 - p , p , 0 < p < 1 $$ \left\{\left(1-p,p\right),0<p<1\right\} $$ ,提供人口分布模型。如果伯努利家族的支持是 0 , 1 $$ \左\{0,1\右\} $$ ,那么 n $$ n $$ 的折叠卷积 1 - p , p $$ \左(1-p,p\右) $$ 是二项式分布 n , p $$ B\左(n,p\右​​) $$ 放置质量 n y p y 1 - p n - y $$ \left(\genfrac{}{}{0ex}{}{n}{y}\right){p}^y{\left(1-p\right)}^{ny} $$ 总和 y $$ 和 $$ 。对于理性的 p $$ p $$ 通过考虑大小的所有可能样本(带替换)可以获得二项式分布 n $$ n $$ 并计算每个总和的相对频率。1 当每个样本的可能性相同时,这些相对频率就是概率。卷积将伯努利分布和二项式分布之间的关系扩展到以下情况 p $$ p $$ 是单位区间内的任意实数。

这里的关键是伯努利分布及其与二项式族的关系是数学的一部分,不涉及随机化。如果 1 , p $$ B\左(1,p\右) $$ 为大众提供了一个好的模型,那么 n , p $$ B\左(n,p\右​​) $$ 为总和的抽样分布。然而,如果观测值位于 n , p $$ B\大(n,p $$ )这并不一定意味着观察应该被认为是不可能的。正如我们在彩票示例中看到的,总和为 28 位于采样分布的最尾部,但事实是每个 4 元组都有相等的选择机会,使得尾部区域等于概率。对于二项式的例子以及一般的推理来说也是如此。

现在,两个模型被一系列模型所取代,推理涉及一系列零假设,因此,一系列还原为难以置信的论证。与每个参数关联的p值是一个尾部区域,它描述了从前提获得的采样分布中样本作为一个点的极端程度结论需要一个概率,并且总体中的单个随机样本证明该连续体中的每个尾部区域都是一个概率。

当我们有一系列假设时,将p值解释为描述随机过程(当它被标记为随机变量时人们倾向于这样做)是有问题的。假设的样本来自总体模型,当这个模型是无限的时,并不清楚给予每个样本平等的被选择机会意味着什么。我认识到与p值相关的假设随机过程还有其他解释。事实上,过多的此类解释可能会使 p 值变得混乱。

我的最后评论涉及几何。与 Godambe ( 1960 ) 估计函数相关的广义估计量对于研究定义 p 值的不同方法的推理属性非常有用。这些广义估计量在统计流形上形成向量丛,并且它们的统计特性通过与切丛的关系来描述。详细信息请参见 Vos ( 2022 )。

更新日期:2023-04-12
down
wechat
bug