1 Introduction

In many data sets the first non-zero digit d is not uniformly distributed but obeys a logarithmic law. This fact was observed by Newcomb (1881) and Benford (1938). Conformance officers of big companies use the Benford law for unscrambling data manipulations mostly by applying the \(\chi ^2\) goodness-of-fit test. Such manipulations may be inserting fraudulent figures or changing digits. Those and more applications may be found, for example, in the books of Nigrini (2012) and Berger and Hill (2015), see also Kössler et al. (2024). However, not every real or artificial data set follows the Benford law, the question arises how this can be tested in practice. There is a vast literature on applications and testing of Benford’s law, we refer to the website of Berger et al. (accessed 3.1.2024) and to Nigrini (2012). The latter author applies Pearsons \(\chi ^2\)-test, the Kolmogorov–Smirnov test and the MAD-test on the 1st digit, the 2nd digit, the 1st and 2nd digit together and 1st, 2nd and 3rd digit together. His MAD-test assigns the numerical values of the MAD statistic to the linguistic terms close conformity, acceptable conformity, marginally acceptabel conformity und nonconformity, cf. Table 7.1 (p. 160) of Nigrini (2012) book.

Berger and Hill (2011) as well as Nigrini (1992) analyzed the scale-, base- and sum-invariance. The latter includes especially that the expected sum of all the significands with leading digit 1 is equal to the sums of the significands of the remaining digits 2, ..., 9, respectively.

In the present article we apply the sum invariance properties of Benford’s law for constructing several further tests of significance. Our test statistics are suitable linear combinations of squares of suitably chosen statistics, and they are, under the null hypothesis, asymptotically or approximately \(\chi ^2\)-distributed.

Emphasis is especially taken on the second significant digit. A \(\chi ^2\) goodness-of-fit test for the second digit was already suggested, cf. eg. Diekmann (2007). We suggest some further tests based on properties of the second significant digit. In Sect. 2.1 we present some basic properties and some statistical tests for testing Benford that are applied later on. In Sect. 2.2 we recall the \(\chi ^2\) goodness-of-fit test, the Kolmogorov–Smirnov test, and apply them to the first and second significant digit. Moreover, the MAD-test is modified to obtain critical values that do not depend on the sample size. In Sect. 2.3 we introduce four variants of the invariant sum test, three of them are new, and in Sect. 3 we illustrate the considered tests on some chosen data sets. In Sect. 4 we summarize and discuss the results. All mathematical derivations are deferred to the appendices.

2 Methodology

2.1 Some basics of the Benford law

Benford’s law makes claims about the leading digits of a number regardless of its scale. Closely connected to the leading digits are the terms of significands and significant digits, which formal notion is given in Definition 1.

Definition 1

(Significant digits and the significand, Berger and Hill (2015)) Let \(x\in \mathbb {R}\). The first significant digit \(D_1(x)=d\) of x is given by the unique integer \(d\in \{1,2,\ldots ,9\}\) where \(10^kd \le |x |< 10^k(d+1)\) with an integer k.

The m-th significant digit \(D_m(x)=d\) with \(m\ge 2\) can recursively be determined by

\(10^k\left( \sum _{i=1}^{m-1} D_i(x)10^{m-i}+d \right) \le |x|< 10^k\left( \sum _{i=1}^{m-1}\ D_i(x)10^{m-i} +d+1 \right)\)

where \(d\in \{0,1,\ldots ,9\}\) and \(k\in \mathbb {Z}\).

The significand function \(S: \mathbb {R} \rightarrow [1,10)\) is defined as follows: If \(x\ne 0\) then \(S(x)=t\), where t is the unique number \(t \in [1,10)\) with \(|x|=10^kt\) for some unique \(k\in \mathbb {Z}\). For \(x=0\) we set, for convenience, \(S(0):=0\).

Next, we state the strong and weak form of Benford’s law.

Definition 2

(Benford’s law for the significand, strong form of Benford’s law) The significand S(X) follows Benford’s law if

$$\begin{aligned} P(S(X)\le t)=\log t \text { for all } t \in [1,10). \end{aligned}$$
(1)

Definition 3

(Benford’s law for the first significant digit, weak form of Benford’s law) The probability of the first significant digit \(d\in \{1,2,3...9\}\) is \(P(D_1(X)=d)=\log (1+d^{-1})\).

In Table 1, we give the distribution of the leading digit \(D_1\).

Table 1 Probabilities \(P(D_1(X)=d_1)\) according to Benford’s law

In the following we call a random variable X Benford distributed iff (1) is satisfied, and we write \(X\sim\) Benford. Benford distributed random variables own some remarkable properties. In the present article we focus on the sum-invariance property. Sum-invariance specifically means that, if summing all significands with the first digit 1 we expect the same sum as summing all significands with the first digit 2, 3 etc., i.e. their expectations are the same. For further explanations and proofs we refer to Berger and Hill (2011, 2015), Pinkham (1961) and Nigrini (1992).

2.2 Classical tests against Benford and their modifications

Our test problem is in general

$$\begin{aligned} H_0: X \sim \text{ Benford } \qquad \text{ against }\qquad H_1: X \not \sim \text{ Benford }. \end{aligned}$$

Note, \(H_1\) is a very large class of alternatives.

The \(\chi ^2\)-test is one of the most popular goodness-of-fit tests, and it was originated by Pearson (1900). The \(\chi ^2\)-test statistic measures the relative distance between the relative frequencies \(n_j/n\) and the probabilities \(p_j=P(D_1=d_j)\) for all \(j=1,2,\ldots ,9\) under the Benford law, and it is defined by

$$\begin{aligned} \chi ^2=n\sum _{j=1}^9\frac{(n_j/n-p_j)^2}{p_j}=\sum _{j=1}^9 \frac{(n_j-np_j)^2}{np_j}. \end{aligned}$$
(2)

The \(\chi ^2\)-test rejects the null hypothesis \(H_0\), if \(\chi ^2>\chi ^2_{1-\alpha ,8}\), where \(\chi ^2_{1-\alpha ,8}\) is the \((1-\alpha )\) quantile of the \(\chi ^2\) distribution with eight degrees of freedom. Note that the \(\chi ^2\) goodness-of-fit test is an approximate test, the statistic (2) is asymptotically \(\chi ^2\)-distributed with eight degrees of freedom.

Since some data fraudsters may know Benford’s law for the first significant digit some authors suggest to use the second significant digits instead of the first one and to apply a goodness-of-fit test to them, cf. eg. Diekmann (2007) or Hein et al. (2012) for scientific fraud or Mebane (2010) for election fraud.

The probability of the second significant digit \(d\in \{0,1,2,\ldots ,9\}\) is \(P(D_2(X)=d)=\sum _{j=1}^9\log _{10} (1+\frac{1}{10j+d})\), and it is presented in Table 2. Note that according to rounding effects, the probabilities do not exactly sum up to one. We abbreviate both variants of the \(\chi ^2\) goodness-of-fit test by GoF1 and GoF2, respectively.

Table 2 Probabilities \(P(D_2(X)=d_2)\) according to Benford’s law

An alternative goodness-of-fit test is the Kolmogorov–Smirnov (KS) test, cf. Kolmogorov (1933), Smirnov (1948) and Darling (1957). The idea of this test is to compare the empirical cumulative distribution function (cdf) \(F_n(x)\) with a fully specified theoretical one, \(F_0(x)\). The KS-tests uses the norm

$$\begin{aligned} d_{max}= sup_{x\in \mathbb {R}}\vert F_n(x)-F_0(x)\vert . \end{aligned}$$
(3)

Since we investigate tests based on the first or second significant digit, respectively, we apply the KS-test first to the first significant digit according to the weak form of Benford’s law (cf. Definition 3). Alternatively, we apply the KS-test to the second significant digit.

The critical values of the KS test were completely tabulated by Miller (1956) for underlying continuous distributions. Morrow (2014) computed tighter bounds by Monte Carlo simulation for the discrete Benford distribution of the first digit, cf. Table 1.

For the KS-test applied to the second significant digit, cf. Table 2, the (asymptotic) critical values are approximated by simulation. To do this we simulate from a (continuous) Benford distribution (cf. Definition 2). Then we put the observations into bins \(0,\ldots , 9\) according to the the definition of the second (significant) digit. Taking a large sample size of \(n=10,000\) and repeating this \(M=10,000\) times we get a sufficiently accurate estimation of the asymptotic critical values. Some critical values are presented in Table 3. We abbreviate the KS-tests, based on the first or second significant digit, respectively, by KS1 and KS2.

Table 3 Asymptotic critical values \(c_{KS2,1-\alpha }\) and \(c_{MAD2,1-\alpha }\) of the KS2 and MAD2 test

As another alternative goodness-of-fit test we suggest a kind of MAD-test that is based on the statistic

$$\begin{aligned} MAD = \sqrt{n} \sum _{j=1}^{k}\vert n_j/n-p_j \vert , \end{aligned}$$
(4)

where MAD stays for Mean Absolute Deviation. Though there are no means here, our proposal is derived from an idea due to Nigrini (2012) who used the mean \(MAD_N=\sum _{j=1}^{k}\frac{\vert n_j/n-p_j \vert }{k}\), where the index N stays for Nigrini. Our proposal uses a suitably scaled sum of the absolute deviations between the relative frequencies and the Benford probabilities for the first digit. In our new version we introduced the factor \(\sqrt{n}\) to get critical values of the test being rather independent of n. This property is illustrated in Table 4. The motivation for introducing the factor \(\sqrt{n}\) is that the relative frequencies tend to the true probability with \(\sqrt{n}\) rate. Recently, Cerqueti and Lupi (2021) obtained the asymptotic distribution of the MAD statistic (4). From that we computed the asymptotic critical values, cf. Table 4. The convergence of the finite critical values to the asymptotic critical value is rather fast.

Table 4 Critical values \(c_{MAD,1-\alpha }\) of the MAD test (1st digit)

Evidently, the critical values are not very sensitive to the sample sizes. For simplicity, we use in our study the critical value \(c_{MAD,1-\alpha }=3.60\) for \(\alpha =0.01\).

The MAD-test may also be applied to the second significant digit. Some (asymptotic) critical values are presented in Table 3. The critical values are obtained in an analogous way as that for the tests KS1 and KS2. We abbreviate both variants, first and second significant digit, by MAD1 and MAD2.

One may ask why we do not use the first two digits together. This idea was suggested in, Diekmann (2007) cf. also Nigrini (2012). However, we consider data sets with moderate sample sizes, nearly between n = 200 and n = 4000. If we use the first two digits together then we have altogether 90 bins and therefore very much bins with very few or even no observations. Therefore this idea is not applicable here to the class of invariant sum tests. However, some kind of KS-test or MAD-test for discrete distributions might be applied. Since our interest here lies on invariant sum tests, they are not considered.

Of course, there are other possibilities to test against Benford, despite of the invariant sum tests that we introduce in the next section. We mention only two recently published ideas. Kazemitabar and Kazemitabar (2022) make use of the alternative definition of Benford’s Law saying that the logarithms of the significands are uniformly distributed. Cerqueti and Maggi (2021) discuss some distance measures, especially the sum of squares deviation and the MAD.

2.3 Invariant sum tests

In this section we apply the invariant-sum property of Benford, cf. Nigrini (1992), Allaart (1997) and Berger and Hill (2015, theorem 5.18).Berger and Hill (2015) To do this we define the sets \(C(d_1,\ldots ,d_m)=\{x\in [1,10): D_j(x)=d_j~ \text{ for }~ j=1,\ldots ,m\}\), \(C_1(d_1)=\{x\in [1,10): D_1(x)=d_1\}=[d_1,d_1+1)\) and \(C_2(d_2)=\{x\in [1,10): D_2(x)=d_2\}=\bigcup _{j=1}^9[j+\frac{d_2}{10},j+\frac{d_2+1}{10})\). \(C(d_1,\ldots ,d_m)\) is the set of all significands with first m digits \(d_1,\ldots ,d_m\), \(C_1(d_1)\) is the set of all significands with first digit \(d_1\), and \(C_2(d_2)\) is the set of all significands with second digit \(d_2\).

Proposition 1

(Sum Invariance (Berger and Hill (2015), Nigrini (2012), Allaart (1997)) A random variable X is Benford if and only if X has sum invariant significant digits, i.e. for every fixed \(m, m\in \mathbb {N}\), the expectations \(\mathbb {E}(S(X)\mathbbm {1}_{C(d_1,\ldots , d_m)}(S(X)))\) are the same for all tuples \((d_1,\ldots ,d_m), d_1\ne 0\) of digits.

Therefore one necessary condition for X to be Benford is that the expectation of the sum of all significands with the first digit 1, 2, 3, ..., 9 is the same. The same is true for the expectation of the sum of all significands with second digit \(0,1,\ldots ,9\).

Let us start with the first significant digit. Denote by \(\theta =\mathbb {E}\bigl (S(X)\mathbbm {1}_{C_1(i)}(S(X))\bigr )=\frac{1}{\ln 10}\) the expectation of the random variable \(S(X)\mathbbm {1}_{C_1(d_1)}(S(X))\) if Benford is true. Let \(\theta _i\) be the true expectation of \(S(X)\mathbbm {1}_{C_1(i)}(S(X))\) for the underlying distribution.

Then our first test problem is

$$\begin{aligned} H_{0,1}: \theta _1=\ldots =\theta _9=\theta \quad \text{ against }\quad H_{1,1}:~ \exists j\in \{1,\ldots ,9\}: \theta _j\ne \theta \end{aligned}$$

Denote the sums of the significands of the observations \(X_i\) in the interval \([j,j+1)\)

$$\begin{aligned} \text{ Sum}_{1,j}=\sum _{i=1}^n S(X_i)\mathbbm {1}_{C_1(j)}(S(X_i)). \end{aligned}$$

Since we have sums of n independent identically distributed random variables \(S(X_i)\mathbbm {1}_{C_1(j)}(S(X_i)), i=1,\ldots ,n\), and they have finite variance, we may assume that they are approximately normally distributed, and the standardized sums

$$\begin{aligned} R_{1,j}=\frac{\text{ Sum}_{1,j}-\mathbb {E} (\text{ Sum}_{1,j})}{\sqrt{\text{ var }(Sum_{1,j})}} \end{aligned}$$

are (approximately) standard normal. The expectations \(\mathbb {E} (\text{ Sum}_{1,j})=\frac{n}{\ln 10}\), variances \(\text{ var(Sum}_{1,j})\) and covariances are derived in the Appendix A.

Let be \(\textbf{R}_1=(R_{1,1},\ldots ,R_{1,9})\) and \({\varvec{{\Sigma }}}_{R_1}\) be the correlation matrix of the vector \(\textbf{R}_1\)of standardized sums under the null. We consider the following two types of test statistics

$$\begin{aligned} IS_{1,E} = \textbf{R}_1'\textbf{R}_1 \qquad \text{ and } \qquad IS_{1,M}= \textbf{R}_1'{\varvec{{\Sigma }}}^{-1}_{R_1}\textbf{R}_1 \end{aligned}$$

where IS stays for Invariant Sum. The statistic \(IS_{1,E}\) is the Euklidean distance of the vector \(\textbf{R}_1\) of standardized sums from zero, and \(IS_{1,M}\) is the corresponding Mahalanobis distance.

The question may come up why we use both distance measures, Euclidean and Mahalanobis. The two distances are generally different, and so are the corresponding test statistics. Therefore there may be alternative directions for which the Euclidean distance is better than the Mahalanobis distance and vice versa.

Theorem 2

Under \(H_{0,1}\) the statistic \(IS_{1,M}\) is asymptotically \(\chi ^2\)-distributed with nine degrees of freedom, and \(IS_{1,E}\) is is approximated by a weighted sum of independent \(\chi ^2\)-distributed random variables, each with one degree of freedom.

The proof of the theorem can be found in Appendix B.

The null hypothesis \(H_{0,1}\) is rejected in favour of \(H_{1,1}\) if \(IS_{1,M}>\chi ^2_{1-\alpha ,9}\) or if \(IS_{1,E}>c_{IS_{1,E},1-\alpha }\), respectively, where \(\chi ^2_{1-\alpha ,9}\) is the \(1-\alpha\)-quantile of the \(\chi ^2\)-distribution with nine degrees of freedom and \(c_{IS_{1,E},1-\alpha }\) is the corresponding quantile of the null distribution of \(IS_{1,E}\). The latter quantile will be determined by approximating the null distribution of \(IS_{1,E}\) by a suitably scaled and shifted \(\chi ^2\)-distribution, see Appendix C. Table 5 gives simulated levels of significance of the two tests. Even for small sample sizes they are close to the nominal value of \(\alpha =0.01\).

Table 5 Simulated levels of significance under \(H_{0,1}\) and \(H_{0,2}\), respectively, of the invariant sum tests, for various sample sizes, nominal level of significance \(\alpha =0.01\)

Note that statistic \(IS_{1,M}\) was independently introduced by Barabesi et al. (2021).

Now, consider the second significant digit. Denote by \(\vartheta =\mathbb {E}\bigl (S(X)\mathbbm {1}_{C_2(j)}(S(X))\bigr ) =\frac{9}{10\ln 10}\) the expectation of \(S(X)\mathbbm {1}_{C_2(j)}(S(X))\) if Benford is true. Let \(\vartheta _j\) the true expectation of \(S(X)\mathbbm {1}_{C_2(j)}(S(X))\) for the underlying distribution.

Then our second test problem is

$$\begin{aligned} H_{0,2}: \vartheta _0=\ldots =\vartheta _9=\vartheta \quad \text{ against }\quad H_{1,2}:~ \exists j\in \{0,\ldots ,9\}: \vartheta _j\ne \vartheta \end{aligned}$$

Denote the sums of the significands in \(C_2(j)\) of observations \(X_i\)

$$\begin{aligned} \text{ Sum}_{2,j}=\sum _{i=1}^n S(X_i)\mathbbm {1}_{C_2(j)}(S(X_i)). \end{aligned}$$

Again, we have sums of n independent identically distributed random variables \(S(X_i)\mathbbm {1}_{C_2(j)}(S(X_i)), i=1,\ldots ,n\), and they have finite variance, we may assume that they are approximately normally distributed, and the standardized sums

$$\begin{aligned} R_{2,j}=\frac{\text{ Sum}_{2,j}-\mathbb {E} (\text{ Sum}_{2,j})}{\sqrt{\text{ var(Sum}_{2,j}\text{) }}} \end{aligned}$$

are (approximately) standard normal. The expectations \(\mathbb {E} (\text{ Sum}_{2,j})=\frac{9 n}{10 \ln 10}\), variances \(\text{ var(Sum}_{2,j})\) and covariances are derived in the Appendix A.

Let be \(\textbf{R}_2=(R_{2,0},\ldots ,R_{2,9})\) and let \({\varvec{{\Sigma }}}_{R_2}\) be the correlation matrix of the sums vector \(\textbf{R}_{2}\) under the null. Similarly as above we consider the following two types of test statistics

$$\begin{aligned} IS_{2,E} = \textbf{R}_2'\textbf{R}_2 \qquad \text{ and } \qquad IS_{2,M}= \textbf{R}_2'{\varvec{{\Sigma }}}^{-1}_{R_2}\textbf{R}_2 \end{aligned}$$

Theorem 3

Under \(H_{0,2}\) the statistic \(IS_{2,M}\) is asymptotically \(\chi ^2\)-distributed with ten degrees of freedom, and \(IS_{2,E}\) is a weighted sum of independent \(\chi ^2\)-distributed random variables, each with one degree of freedom.

The proof of the theorem can be found in Appendix B.

Table 5 gives simulated levels of significance of the two tests. Again, even for small sample sizes they are close to the nominal value of \(\alpha =0.01\).

In Appendix F our algorithm for the implementation of the invariant sum tests is provided.

3 Illustration

We illustrate our methods by four carefully selected data sets. The first two data sets are chosen to illustrate that our tests really yield results conforming to Number Theory. The other two represent empirical data sets.

  1. #1:

    Fibonacci (\(n=1000\)) The Fibonacci numbers are proved to be Benford distributed, cf. e.g. Berger and Hill (2015).

  2. #2:

     Prime Numbers (\(n=1000\)) Opposite to the Fibonacci numbers the prime numbers are known to be not Benford, cf. e.g. Berger and Hill (2015).

  3. #3:

    Population (\(n=3998\)) This data set consists of the number of inhabitants in cities worldwide that are larger than 100.000 people. It illustrates that data from certain truncated distributions are not Benford, cf. Appendix D.

  4. #4:

    Share Prices (\(n=369\)) The data include share prices as a mixture from international stock market indices. Such data sets are assumed to behave like Benford, according to the Theorem of Mixtures due to Berger and Hill (2015, section 8.3).

First, we study the behaviour of each of the four invariant sum tests. The level of significance is \(\alpha =0.01\). The nine values of the statistics \(R_{1,i}\), \(i=1,\ldots ,9\) as well as the ten values of the statistics \(R_{2,i}\), \(i=0,\ldots ,9\) are summarized in Fig. 1. We see that the values of \(R_{1,j}\) for the Share Prices and for the Fibonacci numbers are very close to zero which provides some evidence of the Benford property. For the datasets Population and Prime Numbers the boxes are large and far from zero which gives some evidence of non-Benford. For the second significant digit it is similar but sometimes less clear. However, for Share Prices most of the values \(R_{2,j}\) are less than one resulting in small values for \(IS_{2,E}\) and \(IS_{2,M}\). Table 6 contains the p-values for the tests \(IS_{1,E}\), \(IS_{1,M}\), \(IS_{2,E}\), and \(IS_{2,M}\).

Fig. 1
figure 1

Plots summarizing the values for the statistics \(R_{1,j}\) and \(R_{2,j}\), respectively

Table 6 p-Values for the tests \(IS_{1,E}\), \(IS_{1,M}\), \(IS_{2,E}\), and \(IS_{2,M}\) applied on our illustrative data sets

Note that the values are rounded. This way, the entries especially for p-values may become 1.00 or 0.00. The p-value of (nearly) 1.00 of Fibonacci numbers signals evidence of the from Number Theory well-known fact that they are nearly perfect Benford. The (rounded) p-value of 0.00 gives very strong evidence of the well known fact that prime numbers are not Benford, also known from number theory. For the notation of evidence and (very) strong evidence we refer to Wasserman (2004). The two data sets, Fibonacci and Prime numbers, are selected for illustrating that all the tests considered yield a decision that confirms the mathematical theory. Note that for the Fibonacci and prime numbers we have some few entries with only one digit. As they represent structural non-existing items they are removed when testing for the second significant digit.

The tests conform to the underlying theories, i.e number theory, Berger and Hill’s theorem on mixtures and the conjecture of bounded domains in Appendix D. The data set #1 (Fibonacci) is clearly Benford. Furthermore, data set #3 (Population) is clearly not Benford. For an explanation based on trimming of values or bounded domains we refer to the Appendix D. Prime numbers (data set #2) are known to be not Benford which is clearly illustrated by the three tests \(IS_{1,E}, IS_{1,M}\) and \(IS_{2,M}\), the test \(IS_{2,E}\) does not reject Benford at the \(\alpha =0.01\) level that might be caused by less power of \(IS_{2,E}\) for sample size n=1000. The data set #4 (Share Prices) is in accordance with the Mixture Theorem of Berger and Hill (2015) such leading to Benford’s law.

The results for all tests considered, KS1, KS2, GoF1, GoF2, MAD1, MAD2, \(IS_{1,E}\), \(IS_{1,M}\), \(IS_{2,E}\) and \(IS_{2,M}\), are presented in Table 7. Bold values mean ’rejection’, given \(\alpha =0.01\). Note that for that decision one and only one corresponding test himself is considered, such that the multiple test problem is not relevant here. The classical tests GoF1, KS1, and MAD1 confirm the results of the invariant sum tests.

Note that when testing primes most of the tests based on the second significant digit do not reject Benford due to low power. However, if we inccrease the sample size and take all prime numbers between 11 and 100,000 then Benford will be rejected by all the tests based on the second significant digit, too.

Table 7 Critical values \((\alpha =0.01\)) and observed values of the various Goodness of Fit tests

4 Summary

We consider several statistical tests of the Benford law, some few are known, most are new. Completely new tests are that based on the second significant digit, except test GoF2. The various variants of the invariant sum tests are appealing as they use the significand. Therefore the Invariant Sum tests use the full information of the data.

We have shown that almost all the tests give confirmative results for data sets for which there is a theory whether the Benford property is true or not, except for primes with the second significant digit, cf. Tables 7 and 8. The last line in Table 8 presents the Bonferroni adjusted p-values and it is intended only for a very quick impression to the reader. We see that data sets #1 and, quite sure, #4 are Benford, the other two are not.

In future research it is intended to investigate which of the considered tests is good for various alternative directions. Moreover, various sample sizes are to be considered. Furthermore, we intend to construct tests that are based on sum invariance and on other invarianvce principles.

Table 8 p-Values of the various test statistics