Some new invariant sum tests and MAD tests for the assessment of Benford’s law

Kössler, Wolfgang; Lenz, Hans-J.; Wang, Xing D.

doi:10.1007/s00180-024-01463-8

Some new invariant sum tests and MAD tests for the assessment of Benford’s law

Original Paper
Open access
Published: 13 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Statistics Aims and scope Submit manuscript

Some new invariant sum tests and MAD tests for the assessment of Benford’s law

Download PDF

293 Accesses
Explore all metrics

Abstract

The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s $\chi ^2$-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.

Two Powerful Tests for Normality

Article 03 June 2016

The signed Kolmogorov-Smirnov test: why it should not be used

Article Open access 27 February 2015

Chi-Square-Type Tests for Verification of Normality

Article 01 September 2015

1 Introduction

In many data sets the first non-zero digit d is not uniformly distributed but obeys a logarithmic law. This fact was observed by Newcomb (1881) and Benford (1938). Conformance officers of big companies use the Benford law for unscrambling data manipulations mostly by applying the $\chi ^2$ goodness-of-fit test. Such manipulations may be inserting fraudulent figures or changing digits. Those and more applications may be found, for example, in the books of Nigrini (2012) and Berger and Hill (2015), see also Kössler et al. (2024). However, not every real or artificial data set follows the Benford law, the question arises how this can be tested in practice. There is a vast literature on applications and testing of Benford’s law, we refer to the website of Berger et al. (accessed 3.1.2024) and to Nigrini (2012). The latter author applies Pearsons $\chi ^2$-test, the Kolmogorov–Smirnov test and the MAD-test on the 1st digit, the 2nd digit, the 1st and 2nd digit together and 1st, 2nd and 3rd digit together. His MAD-test assigns the numerical values of the MAD statistic to the linguistic terms close conformity, acceptable conformity, marginally acceptabel conformity und nonconformity, cf. Table 7.1 (p. 160) of Nigrini (2012) book.

Berger and Hill (2011) as well as Nigrini (1992) analyzed the scale-, base- and sum-invariance. The latter includes especially that the expected sum of all the significands with leading digit 1 is equal to the sums of the significands of the remaining digits 2, ..., 9, respectively.

In the present article we apply the sum invariance properties of Benford’s law for constructing several further tests of significance. Our test statistics are suitable linear combinations of squares of suitably chosen statistics, and they are, under the null hypothesis, asymptotically or approximately $\chi ^2$-distributed.

Emphasis is especially taken on the second significant digit. A $\chi ^2$ goodness-of-fit test for the second digit was already suggested, cf. eg. Diekmann (2007). We suggest some further tests based on properties of the second significant digit. In Sect. 2.1 we present some basic properties and some statistical tests for testing Benford that are applied later on. In Sect. 2.2 we recall the $\chi ^2$ goodness-of-fit test, the Kolmogorov–Smirnov test, and apply them to the first and second significant digit. Moreover, the MAD-test is modified to obtain critical values that do not depend on the sample size. In Sect. 2.3 we introduce four variants of the invariant sum test, three of them are new, and in Sect. 3 we illustrate the considered tests on some chosen data sets. In Sect. 4 we summarize and discuss the results. All mathematical derivations are deferred to the appendices.

2 Methodology

2.1 Some basics of the Benford law

Benford’s law makes claims about the leading digits of a number regardless of its scale. Closely connected to the leading digits are the terms of significands and significant digits, which formal notion is given in Definition 1.

Definition 1

(Significant digits and the significand, Berger and Hill (2015)) Let $x\in \mathbb {R}$. The first significant digit $D_1(x)=d$ of x is given by the unique integer $d\in \{1,2,\ldots ,9\}$ where $10^kd \le |x |< 10^k(d+1)$ with an integer k.

The m-th significant digit $D_m(x)=d$ with $m\ge 2$ can recursively be determined by

$10^k\left( \sum _{i=1}^{m-1} D_i(x)10^{m-i}+d \right) \le |x|< 10^k\left( \sum _{i=1}^{m-1}\ D_i(x)10^{m-i} +d+1 \right)$

where $d\in \{0,1,\ldots ,9\}$ and $k\in \mathbb {Z}$.

The significand function $S: \mathbb {R} \rightarrow [1,10)$ is defined as follows: If $x\ne 0$ then $S(x)=t$, where t is the unique number $t \in [1,10)$ with $|x|=10^kt$ for some unique $k\in \mathbb {Z}$. For $x=0$ we set, for convenience, $S(0):=0$.

Next, we state the strong and weak form of Benford’s law.

Definition 2

(Benford’s law for the significand, strong form of Benford’s law) The significand S(X) follows Benford’s law if

$$\begin{aligned} P(S(X)\le t)=\log t \text { for all } t \in [1,10). \end{aligned}$$

(1)

Definition 3

(Benford’s law for the first significant digit, weak form of Benford’s law) The probability of the first significant digit $d\in \{1,2,3...9\}$ is $P(D_1(X)=d)=\log (1+d^{-1})$.

In Table 1, we give the distribution of the leading digit $D_1$.

Table 1 Probabilities $P(D_1(X)=d_1)$ according to Benford’s law

Some new invariant sum tests and MAD tests for the assessment of Benford’s law

Abstract

Similar content being viewed by others

Two Powerful Tests for Normality

The signed Kolmogorov-Smirnov test: why it should not be used

Chi-Square-Type Tests for Verification of Normality

1 Introduction

2 Methodology

2.1 Some basics of the Benford law

Definition 1

Definition 2

Definition 3

2.2 Classical tests against Benford and their modifications

2.3 Invariant sum tests

Proposition 1

Theorem 2

Theorem 3

3 Illustration

4 Summary

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix A

1.1 Expectations, variances and covariances of the significands with fixed first or fixed second digit, respectively

Appendix B

1.1 Proof of theorems 2 and 3

Proof of theorem 2

Proof of theorem 3

Appendix C

1.1 Approximation of the weighted sums by a \(\chi ^2\) distributed random variable

Appendix D

Appendix E

1.1 Frequencies of first and second significant digits

Appendix F

1.1 Algorithm that computes the test statistics and the p-values of the invariant sum tests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation