Skip to main content
Log in

On the Asymptotic Power of a Method for Testing Hypotheses on the Equality of Distributions

  • MATHEMATICS
  • Published:
Vestnik St. Petersburg University, Mathematics Aims and scope Submit manuscript

Abstract

In this paper, the asymptotic power of a method for testing hypotheses on the equality of two distributions is investigated; it can be regarded as a generalization of the Wilcoxon–Mann–Whitney test. We consider a class of distributions such that the mathematical expectation of the square of some auxiliary function is finite. For the case where the alternative distribution differs from the zero distribution only by the shift, the asymptotic distribution of the test and the asymptotic power of the test are explicitly found. Up to now, the power of this test has been studied only using statistical modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The examples were constructed by Anna Belkova, Master of the Faculty of Mathematics and Mechanics, St. Petersburg State University.

REFERENCES

  1. G. Zech and B. Aslan, “New test for the multivariate two-sample problem based on the concept of minimum energy,” J. Stat. Comput. Simul. 75, 109–119 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  2. V. Melas and D. Salnikov, “On asymptotic power of the new test for equality of two distributions,” in Recent Developments in Stochastic Methods and Applications, Ed. by A. N. Shiryaev, K. E. Samouylov, and D. V. Kozyrev (Springer-Verlag, Cham, 2021), in Ser.: Springer Proceedings in Mathematics and Statistics, Vol. 371, pp. 204–214.

  3. E. L. Leman, Testing Statistical Hypotheses (Wiley, New York, 1959; Nauka, Moscow, 1979).

  4. H. Buening, “Kolmogorov–Smirnov and Cramer–von Mises type two-sample tests with various weight functions,” Commun. Stat. - Simul. Comput. 30, 847–865 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  5. T. W. Anderson and D. A. Darling, “A test of goodness-of-fit,” J. Am. Stat. Assoc. 49, 765–769 (1954).

    Article  MATH  Google Scholar 

  6. W. Hoeffding, “A class of statistics with asymptotically normal distribution,” Ann. Math. Stat. 19, 293–325 (1948).

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

The work was carried out with financial support of the Russian Foundation for Basic Research (grant no. 20-01-00096-a).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. B. Melas.

Additional information

Translated by L. Kartvelishvili

ANNEX

ANNEX

Proof of Lemma 3.1. We introduce the notation

$$Z = (X,Y) = ({{X}_{1}}, \ldots ,{{X}_{n}},{{Y}_{1}}, \ldots ,{{Y}_{n}}),\quad V(Z) = \frac{1}{2}\sum\limits_{i = 1}^{2n} {\sum\limits_{j = 1}^{2n} {{{{({{Z}_{i}} - {{Z}_{j}})}}^{2}}.} } $$

The proof follows from the well-known formula (see, e.g., ([6], p. 296))

$$\frac{1}{{n(n - 1)}}\sum\limits_{1 \leqslant i < j \leqslant n} {{{{({{X}_{i}} - {{X}_{j}})}}^{2}}} = \frac{1}{{(n - 1)}}\sum\limits_{i = 1}^n {{{{({{X}_{i}} - \bar {x})}}^{2}}} $$
(10)

and the obvious identity

$$\sum\limits_{i = 1}^{2n} {\sum\limits_{j = 1}^{2n} {{{{({{Z}_{i}} - {{Z}_{j}})}}^{2}}} } = \sum\limits_{i,j = 1}^n {{{{({{X}_{i}} - {{X}_{j}})}}^{2}}} + \sum\limits_{i,j = 1}^n {{{{({{Y}_{i}} - {{Y}_{j}})}}^{2}}} + 2\sum\limits_{i = 1}^n {\sum\limits_{j = 1}^n {{{{({{X}_{i}} - {{Y}_{j}})}}^{2}}} } $$
(11)

by direct, but nontrivial calculations.

In fact, we use the standard form of writing

$$S_{x}^{2} = \frac{1}{{(n - 1)}}\sum\limits_{i = 1}^n {{{{({{X}_{i}} - \bar {x})}}^{2}};} $$

\(S_{y}^{2}\) and \(S_{z}^{2}\) we imply in an analogous way. We denote

$${{S}_{{xy}}} = \frac{1}{{{{n}^{2}}}}\sum\limits_{i = 1}^n {\sum\limits_{j = 1}^n {{{{({{X}_{i}} - {{Y}_{j}})}}^{2}}.} } $$

Using formulas (10), we obtain

$$V(Z) = 2n\left[ {\sum\limits_{i = 1}^n {{{{({{X}_{i}} - (\bar {x} + \bar {y}){\text{/}}2)}}^{2}}} + \sum\limits_{j = 1}^n {{{{({{Y}_{i}} - (\bar {x} + \bar {y}){\text{/}}2)}}^{2}}} } \right] = 2n(n - 1)(S_{x}^{2} + S_{y}^{2}) + {{n}^{2}}{{(\bar {x} - \bar {y})}^{2}}.$$
(12)

From (10) and (11) we get

$${{n}^{2}}{{S}_{{xy}}} = V(Z) - n(n - 1)(S_{x}^{2} + S_{y}^{2}).$$
(13)

Hence,

$${{S}_{{xy}}} = \frac{1}{n}(n - 1)(S_{x}^{2} + S_{y}^{2}) + {{(\bar {x} - \bar {y})}^{2}}$$

and we get

$${{\Phi }_{{nn}}} = {{S}_{{xy}}} - \frac{1}{n}(n - 1)(S_{x}^{2} + S_{y}^{2}) = {{(\bar {x} - \bar {y})}^{2}}.$$

We assume that hypothesis H0 holds. According to the classical central-limit theorem, \(\sqrt {{{\Phi }_{{nn}}}} \) has a distribution converging at n → ∞ to the normal distribution with zero expectation and the variance J1. The last proposition of the lemma is verified by direct computation. Thus, Lemma 3.1 is proven. It follows from this lemma that the test Φnn in this case is equivalent to the test (\(\bar {x}\)\(\bar {y}\))2.

\(\square \)

Proof of Lemma 3.2. We introduce the notation

$${{U}_{n}}(u,g) = {{\left( \begin{gathered} n \\ 2 \\ \end{gathered} \right)}^{{ - 1}}}\sum\limits_{1 \leqslant {{u}_{i}} < {{u}_{j}} \leqslant n} {g({{u}_{i}} - {{u}_{j}})} ,\quad u = ({{u}_{1}}, \ldots ,{{u}_{n}}).$$
(14)

By definition, the function Un(u, g) represents U statistics (see [6]). We recall that we put m = n and

$${{\Phi }_{{AB}}} = {{\Phi }_{{AB}}}(X,Y,g) = - \frac{1}{{{{n}^{2}}}}\sum\limits_{i,j = 1}^n {g({{X}_{i}} - {{Y}_{j}}),} $$
$${{\Phi }_{A}}(X,g) + {{\Phi }_{B}}(Y,g) = - \frac{1}{{{{n}^{2}}}}\sum\limits_{1 \leqslant i < j \leqslant n} {g({{X}_{i}} - {{X}_{j}})} - \frac{1}{{{{n}^{2}}}}\sum\limits_{1 \leqslant i < j \leqslant n} {g({{Y}_{i}} - {{Y}_{j}}).} $$

Consequently,

$${{\Phi }_{A}}(X,g) = - \frac{1}{2}\frac{{n - 1}}{n}{{U}_{n}}(X,g),\quad - {\kern 1pt} {{\Phi }_{B}}(Y,g) = \frac{1}{2}\frac{{n - 1}}{n}{{U}_{n}}(Y,g),$$
(15)
$${{\Phi }_{{AB}}}(X,Y,g) = \frac{1}{{{{n}^{2}}}}\left( \begin{gathered} 2n \\ 2 \\ \end{gathered} \right){{U}_{{2n}}}(Z,g) - \frac{1}{2}\frac{{n - 1}}{n}{{U}_{n}}(X,g) - \frac{1}{2}\frac{{n - 1}}{n}{{U}_{n}}(Y,g),$$
(16)

where Z = (Z1, …, Z2n) = (X1, …, Xn, Y1, …, Yn). We apply the limit theorem (see Theorem 7.1 [6]) to each of the expressions ΦA(X, g), ΦB(Y, g), and ΦAB(X, Y, g). A direct calculation shows that the nonsingularity condition holds if condition (4) is met. First, suppose that condition (4) is met for g(x) = g*(x) = x2.

According to the limit theorem ([6]), nΦA(X, g) and nΦA(X, g*) have a normal distribution in the limit. Since normal distributions are completely determined by the parameters of shift and scale, we have the equality

$$\frac{1}{n}\sum\limits_{1 \leqslant i < j \leqslant n} {g({{X}_{i}} - {{X}_{j}})} = {{a}^{2}}\frac{1}{n}\sum\limits_{1 \leqslant i < j \leqslant n} {g{\kern 1pt} ^*{\kern 1pt} ({{X}_{i}} - {{X}_{j}})} + \tilde {c} + {{\tilde {\eta }}_{n}},$$
(17)

where a and \(\tilde {c}\) are some numbers, whereas \({{\tilde {\eta }}_{n}}\) is a random variable converging in distribution to a constant equal to zero. Since XiXj and YiYj, 1 ≤ i < jn have the same distribution, we get

$$\frac{1}{n}\sum\limits_{1 \leqslant i < j \leqslant n} {g({{Y}_{i}} - {{Y}_{j}})} = {{a}^{2}}\frac{1}{n}\sum\limits_{1 \leqslant i < j \leqslant n} {g{\kern 1pt} ^*{\kern 1pt} ({{Y}_{i}} - {{Y}_{j}})} + \tilde {c} + {{\tilde {\eta }}_{n}}$$
(18)

with the same constants a and \(\tilde {c}\) as in equality (17).

For the same reason and taking into account equality (15), for ΦAB we obtain the formula

$$\frac{1}{{{{n}^{2}}}}\sum\limits_{i,j = 1}^n {g({{X}_{i}} - {{Y}_{j}})} = a\frac{1}{{{{n}^{2}}}}\sum\limits_{i,j = 1}^n {g{\kern 1pt} ^*{\kern 1pt} ({{X}_{i}} - {{Y}_{j}})} + {{\bar {\eta }}_{n}} + \bar {c},$$

where the constant a is the same as in (17); however, \(\bar {c} \ne \tilde {c}\) and \({{\bar {\eta }}_{n}} \ne {{\tilde {\eta }}_{n}}\). We obtain

$$n{{T}_{n}}(X,Y,g) = {{a}^{2}}n{{T}_{n}}(X,Y,g^*) + c + {{\eta }_{n}},$$

where ηn converges in probability to zero, a and c are constants, a is the same as in formula (17), and g*(x) = x2. By Lemma 3.1, \(\frac{1}{{{{J}_{1}}}}n{{T}_{n}}\)(X, Y, g*) converges in distribution to L2. Thus, the limit distribution nTn(X, Y, g) has the form a2L2 + c.

We consider the case where condition (4) does not hold for g(x) = g*(x).

We suppose K is an arbitrary positive number and

$$\tilde {X} = ({{\tilde {X}}_{1}}, \ldots ,{{\tilde {X}}_{n}}),\quad \tilde {Y} = ({{\tilde {Y}}_{1}}, \ldots ,{{\tilde {Y}}_{n}}),$$

where \({{\tilde {X}}_{i}}\) = Xi if \(\left| {{{X}_{i}}} \right| \leqslant K\) and \({{\tilde {X}}_{i}}\) = K and Xi > 0, \({{\tilde {X}}_{i}}\) = –K Xi < 0. We suppose \({{\tilde {Y}}_{i}}\) are defined in a similar way. Now condition (4) holds both for the given function g(x) and for g(x) = x2 (due to the finite variance of the modified variables).

We consider the value

$$n\left\{ {\frac{1}{{{{n}^{2}}}}\sum\limits_{i,j = 1}^n {g({{{\tilde {X}}}_{i}} - {{{\tilde {Y}}}_{j}})} - \frac{1}{{{{n}^{2}}}}\sum\limits_{i < j} {g({{{\tilde {X}}}_{i}} - {{{\tilde {X}}}_{j}})} - \frac{1}{{{{n}^{2}}}}\sum\limits_{i < j} {g({{{\tilde {Y}}}_{i}} - {{{\tilde {Y}}}_{j}})} } \right\}.$$

Due to the above arguments, the limit distribution of this value has the form R(L, a, c). When K → ∞, the limit distribution exists (according to Theorem 7.1 [6]) and has the same form. Thus, Lemma 3.2 is proven.

\(\square \)

1.1 4. CONCLUSIONS

In this paper, we obtain an asymptotic distribution of the considered test and find a formula for the asymptotic power. Using statistical modeling, it is established that the found formula makes it possible to obtain theoretical power values that statistically insignificantly differ from the empirical powers found by modeling. The results can be used in order to determine the rational sample size, i.e., to design an experiment to test hypotheses. The found formulas are also useful for further investigation of the test in question. For example, for the optimal choice of an auxiliary function.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melas, V.B. On the Asymptotic Power of a Method for Testing Hypotheses on the Equality of Distributions. Vestnik St.Petersb. Univ.Math. 56, 182–189 (2023). https://doi.org/10.1134/S1063454123020115

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1063454123020115

Keywords:

Navigation