Abstract
The main aim of this paper is to propose a set of tools for assessing non-normality taking into consideration the class of multivariate t-distributions. Assuming second moment existence, we consider a reparameterized version of the usual t distribution, so that the scale matrix coincides with covariance matrix of the distribution. We use the local influence procedure and the Kullback–Leibler divergence measure to propose quantitative methods to evaluate deviations from the normality assumption. In addition, the possible non-normality due to the presence of both skewness and heavy tails is also explored. Our findings based on two real datasets are complemented by a simulation study to evaluate the performance of the proposed methodology on finite samples.
Similar content being viewed by others
References
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions. Dover, New York (1970)
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley, New York (2003)
Arellano-Valle, R.B., Contreras-Reyes, J., Genton, M.: Shannon entropy and mutual information for multivariate skew-elliptical distributions. Scandinavian J. Stat. 40, 42–62 (2012)
Arellano-Valle, R.B., Ferreira, C.S., Genton, M.G.: Scale and shape mixtures of multivariate skew-normal distributions. J. Multivariate Anal. 166, 98–110 (2018)
Azzalini, A., Genton, M.G.: Robust likelihood methods based on the skew-\(t\) and related distributions. Int. Stat. Rev. 76, 106–129 (2008)
Bolfarine, H., Galea, M.: On structural comparative calibration under a \(t\)-model. Comput. Stat. 11, 63–85 (1996)
Bodnar, T., Gupta, A.K., Parolya, N.: On the strong convergence of the optimal linear shrinkage estimator for large dimensional covariance matrix. J. Multivariate Anal. 132, 215–228 (2014)
Contreras-Reyes, J., Arellano-Valle, R.: Kullback-Leibler divergence measure for multivariate skew-normal distributions. Entropy 14, 1606–1626 (2012)
Cook, R.D.: Assessment of local influence (with discussion). J. R. Stat. Soc. B 48, 133–169 (1986)
Dykstra, R.L.: Establishing the positive definiteness of the sample covariance matrix. Ann. Math. Stat. 41, 2153–2154 (1970)
Fang, K.T., Zhang, Y.T.: Generalized Multivariate Analysis. Springer, Berlin (1990)
Feng, D., Baumgartner, R., Svetnik, V.: A robust bayesian estimate of the concordance correlation coefficient. J. Biopharm. Stat. 25, 490–507 (2015)
Fiorentini, G., Sentana, E., Calzolari, G.: Maximum likelihood estimation and inference in multivariate conditionally heteroscedastic dynamic regression models with Student \(t\) innovations. J. Bus. Econ. Stat. 21, 532–546 (2003)
Galea, M., Cademartori, D., Curci, R., Molina, A.: Robust inference in the capital asset pricing model using the multivariate \(t\)-distribution. J. Risk Financ. Manage. 13, 123 (2020)
Gao, J., Zhang, B.: Estimation of seismic wavelets based on the multivariate scale mixture of gaussian model. Entropy 12, 14–33 (2010)
Gómez-Villegas, M.A., Gómez-Sánchez-Manzano, E., Maín, P., Navarro, H.: The effect of non-normality in the Power Exponential distribution. In: Pardo, L., Balakrihnan, N., Gil, M.A. (eds.) Modern Mathematical Tools and Techniques in Capturing Complexity, pp. 119–129. Springer-Verlag, Berlin (2011)
Gupta, A.K.: Multivariate skew \(t\)-distribution. Statistics 37, 359–363 (2003)
Gupta, A.K., Varga, T., Bodnar, T.: Elliptically Contoured Models in Statistics and Portfolio Theory, 2nd edn. Springer, New York (2013)
Härdle, W.K., Simar, L.: Applied Multivariate Statistical Analysis, 3rd edn. Springer, New York (2012)
Kent, J.T., Tyler, D.E., Vardi, Y.: A curious likelihood identity for the multivariate \(t\)-distribution. Commun. Stat. Simul. Comput. 23, 441–453 (1994)
Kent, J.T., Tyler, D.E.: Redescending \(M\)-estimates of multivariate location and scatter. Ann. Stat. 19, 2102–2119 (1991)
Kim, H.M., Mallick, B.K.: Moments of random vectors with skew \(t\) distribution and their quadratic forms. Stat. Prob. Lett. 63, 417–423 (2003)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Lange, K., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the \(t\) distribution. J. Am. Stat. Assoc. 84, 881–896 (1989)
Leal, C., Galea, M., Osorio, F.: Assessment of local influence for the analysis of agreement. Biometrical J. 61, 955–972 (2019)
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88, 365–411 (2004)
Ledoit, O., Wolf, M.: Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Stat. 48, 3043–3065 (2020)
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester (1999)
Mardia, K.V.: Measures of multivariate skewness and kurtosis with applications. Biometrika 36, 519–530 (1970)
Mardia, K.V.: Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhyā Ser. B 36, 115–128 (1974)
Maronna, R.A.: Robust \(M\)-estimators of multivariate location and scatter. Ann. Stat. 4, 51–67 (1976)
Poon, W., Poon, Y.S.: Conformal normal curvature and assessment of local influence. J. R. Stat. Soc. B 61, 51–61 (1999)
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (2009)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)
Song, P.X.K., Zhang, P., Qu, A.: Maximum likelihood inference in robust linear mixed-effects models using the multivariate \(t\) distributions. Stat. Sin. 17, 929–943 (2007)
Sutradhar, B.C.: Score test for the covariance matrix of elliptical \(t\)-distribution. J. Multivariate Anal. 46, 1–12 (1993)
Svetnik, V., Ma, J., Soper, K.A., Doran, S., Renger, J.J., Deacon, S., Koblan, K.S.: Evaluation of automated and semi-automated scoring of polysomnographic recordings from a clinical trial using zolpidem in the treatment of insomnia. Sleep 30, 1562–1574 (2007)
Wilson, E.B., Hilferty, M.M.: The distribution of chi-square. Proc. Nat. Acad. Sci. United States Am. 17, 684–688 (1931)
Zhu, H.T., Lee, S.Y.: Local influence for incomplete-data models. J. R. Stat. Soc. B 63, 111–126 (2001)
Zhu, H., Ibrahim, J.G., Lee, S., Zhang, H.: Perturbation selection and influence measures in local influence analysis. Ann. Stat. 35, 2565–2588 (2007)
Acknowledgements
This work was supported by Comisión Nacional de Investigación Científica y Tecnológica, FONDECYT Grants 1140580 and 1150325. Authors are grateful for the valuable comments and suggestions made by the associate editor and anonymous reviewers who, as well as Carla Leal and Ronny Vallejos, who careful read the initial version of the manuscript, allowed improvement to the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have declared no conflict of interest.
Supplementary information
This material is subdivided into two sections. First, we present basic properties of the multivariate t-distribution introduced by Sutradhar (1993). Then, a detailed description of the maximum likelihood estimation procedure considering an EM algorithm is provided.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: The expected information matrix
We use the results shown in the Supplementary Material and note that the Fisher information matrix for \(\varvec{\theta }\) can be written as
where the score function \(\varvec{U}_i(\varvec{\theta }) =(\varvec{U}_i^\top (\varvec{\mu }),\varvec{U}_i^\top (\varvec{\phi }), U_i(\eta ))^\top\) associated to the ith component of the log-likelihood, with \(i=1,\dots ,n\), is defined in Equations (4)-(6), and the expected value \({{\,\mathrm{E}\,}}(\cdot )\) is taken with respect to the density function in (1). Next, we obtain each of the blocks of the Fisher information matrix reported in Equation (7).
From the score functions (4) and (5), it follows that
where \(c_\mu (\eta )=c_\phi (\eta )/(1-2\eta )\), \(c_\phi (\eta )=(1+p\eta )/(1+(p+2)\eta )\). Note that \(c_\mu (\eta )\) and \(c_\phi (\eta )\rightarrow 1\) when \(\eta \rightarrow 0\). On the other hand, we have that \(\varvec{N}_p\varvec{D}_p=\varvec{D}_p\) (see Magnus and Neudecker 1999), which produces the expressions corresponding to the normal case.
Therefore, it is clear that
with \(q_i = 1+c(\eta )\delta _i^2\). Thus, we obtain
By applying Lemmas 3 and 5 from Supplementary Material, it follows that
for \(i=1,\dots ,n\). The score function for \(\eta\) can be written as,
where \(Q_{i\eta } = c(\eta ) \delta _i^2\sim \chi _p^2/\chi _{1/\eta }^2\), \({{\,\mathrm{E}\,}}\{Q_{i\eta } (1+Q_{i\eta })^{-1}\}=\frac{p\eta }{1+p\eta }\) and \({{\,\mathrm{E}\,}}\{\log (1 + Q_{i\eta })\}=\psi \left( \frac{1+p\eta }{2\eta }\right) -\psi \left( \frac{1}{2\eta }\right)\). Let \(U_1 =\log (1+Q_{i\eta })\), \(U_2 =\frac{1+p\eta }{1 - 2\eta }\frac{Q_{i\eta }}{1+Q_{i\eta }}\), \(\overline{U}_1 = U_1 - {{\,\mathrm{E}\,}}(U_1)\) and \(\overline{U}_2 = U_2 -{{\,\mathrm{E}\,}}(U_2)\). Then,
Using the fact that \(\psi (x+1) = \psi (x)+1/x\) we have
Finally, we have that the expected information in relation to \(\eta\) is given by,
From the expansion (Abramowitz and Stegun 1970, Sec. 6.4.12),
we find as \(\eta \rightarrow 0\) that,
Similarly,
Hence,
Appendix B: Non-normality due to asymmetry
Another source of non-normality is the possible asymmetry present in the observations. Shannon entropy, Kullback–Leibler divergence and mutual information for multivariate skew-elliptical distributions have been considered in the literature, see for instance Arellano-Valle et al. (2012) and Contreras-Reyes and Arellano-Valle (2012). We summarize some of these results for the multivariate skew normal and skew t distributions below.
Following Arellano-Valle et al. (2012), we say that a random vector \(\varvec{Z}\in {\mathbb {R}}^p\) has a skew-normal distribution with location vector \(\varvec{\xi }\in {\mathbb {R}}^p\), dispersion matrix \(\varvec{\Omega } > 0\) and shape/skewness parameter \(\varvec{\gamma }\in {\mathbb {R}}^p\), denoted by \(\varvec{Z}\sim \mathsf {SN}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma })\), if its probability density function is
where
is the probability density function of the p-variate \(\mathsf {N}_p(\varvec{\xi },\varvec{\Omega })\) distribution, \(\Phi (\cdot )\) is the univariate \(\mathsf {N}(0,1)\) cumulative distribution function and \(\delta _{\mathsf{skew}}^2 =(\varvec{z}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{z}-\varvec{\xi }) \sim \chi ^2(p)\). The vector of means and the covariance matrix of \(\varvec{Z}\) are given, respectively, by
where, \(\varvec{\delta } = \varvec{\Omega \gamma }/\sqrt{1 + \tau ^2}\) and \(\tau ^2=\varvec{\gamma }^\top \varvec{\Omega \gamma }\).
We say that a random vector \(\varvec{Z}\in {\mathbb {R}}^p\) has a skew-t distribution with location vector \(\varvec{\xi }\in {\mathbb {R}}^p\), dispersion matrix \(\varvec{\Omega }\in {\mathbb {R}}^{p \times p}\), shape/skewness parameter \(\varvec{\gamma }\in {\mathbb {R}}^p\) and \(\nu > 0\) degrees of freedom, denoted by \(\varvec{Z} \sim \mathsf {St}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma },\nu )\), if its probability density function is given by
where
is the probability density function of the p-variate \(t_p(\varvec{\xi },\varvec{\Omega },\nu )\) distribution, \(\delta _{\mathsf{skew}}^2 = (\varvec{z} - \varvec{\xi })^\top \varvec{\Omega }^{-1}(\varvec{z} - \varvec{\xi })/p \sim F(p,\nu )\) and \(T(x;\nu + p)\) is the \(T_1(0,1,\nu +p)\) cumulative distribution function (see, for instance Azzalini and Genton 2008; Arellano-Valle et al. 2012, for details).
If \(\varvec{Z}\sim \mathsf {St}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma },\nu )\) then the vector of means and the covariance matrix of \(\varvec{Z}\) is given by
where \(\alpha (\nu ) =\{\Gamma ((\nu -1)/2)/\Gamma (\nu /2)\} \sqrt{\nu /\pi }\). Note that \(\alpha (\nu )\rightarrow \sqrt{2/\pi }\) as \(\nu \rightarrow \infty\), and we obtain the results for the skew-normal distribution given above.
Arellano-Valle et al. (2012) show that for the skew-normal and skew-t distributions the Shannon entropy has explicit form and given in the following lemmas.
Lemma 4
If \(\varvec{X}\sim \mathsf {SN}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma })\) and \(\varvec{Y}\sim \mathsf {St}_p(\varvec{\xi }, \varvec{\Omega },\varvec{\gamma },\nu )\), then the Shannon entropy is given by
-
(i)
\(H(\varvec{X}) = \frac{1}{2}\log \vert \varvec{\Omega }\vert +\frac{p}{2}(1+ \log 2\pi ) - {{\,\mathrm{E}\,}}[\log \{2\Phi (\tau W)\}]\),
-
(ii)
\(H(\varvec{Y}) = \frac{1}{2}\log \vert \varvec{\Omega }\vert -\log \Gamma \left( \frac{\nu + p}{2}\right) + \log \Gamma \left( \frac{\nu }{2}\right) + \frac{p}{2}\log (\nu \pi ) +\frac{\nu + p}{2}\left\{ \psi \left( \frac{\nu + p}{2}\right) -\psi \left( \frac{\nu }{2}\right) \right\} - {{\,\mathrm{E}\,}}[\log \{2T(\tau W^*; \nu +p)\}]\),
with \(W \sim \mathsf {SN}(0,1,\tau )\), \(\tau ^2 =\varvec{\gamma }^\top \varvec{\Omega \gamma }\); \(W^* = \sqrt{\nu + p} \,W_{\mathsf{St}}/\sqrt{\nu + p-1 + W_{\mathsf{St}}^2}\) where \(W_{\mathsf{St}} \sim \mathsf {St}(0,1,\tau ,\nu +p-1)\).
Lemma 5
Let \(\varvec{Z}\sim \mathsf {N}_p(\varvec{\mu },\varvec{\Sigma })\). If \(\varvec{X}\sim \mathsf {SN}_p(\varvec{\xi }, \varvec{\Omega },\varvec{\gamma })\), \(\varvec{Y}\sim \mathsf {St}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma },\nu )\) then the negentropy of \(\varvec{X}\) and \(\varvec{Y}\) are given, respectively, by,
-
(i)
\(H_N(\varvec{X}) = \frac{1}{2}\log \vert \varvec{\Sigma }\vert -\frac{1}{2}\log \vert \varvec{\Omega }\vert +{{\,\mathrm{E}\,}}[\log \{2\Phi (\tau W)\}]\), and
-
(ii)
\(H_N(\varvec{Y}) = \frac{1}{2}\log \vert \varvec{\Sigma }\vert +\frac{p}{2}(1+\log 2\pi ) - \frac{1}{2}\log \vert \varvec{\Omega }\vert +\log \Gamma \left( \frac{\nu + p}{2}\right) -\log \Gamma \left( \frac{\nu }{2}\right) - \frac{p}{2}\log (\nu \pi )- \frac{\nu + p}{2} \left\{ \psi \left( \frac{\nu + p}{2}\right) - \psi \left( \frac{\nu }{2}\right) \right\} + {{\,\mathrm{E}\,}}[\log \{2T(\tau W^*; \nu +p)\}]\).
Mardia (1970) introduced one of the popular and commonly used measures of multivariate skewness of an arbitrary p-dimensional random vector \(\varvec{Z}\) with mean vector \(\varvec{\mu }\) and covariance matrix \(\varvec{\Sigma }\). Mardia’s skewness coefficient is defined as,
which can be expressed as \(\beta _{1,p} ={{\,\mathrm{tr}\,}}\{\varvec{S}^\top (\varvec{Y})\varvec{S}(\varvec{Y})\}\), where \(\varvec{S}(\varvec{Y}) ={{\,\mathrm{E}\,}}(\varvec{Y}\otimes \varvec{Y}^\top \otimes \varvec{Y})\), with \(\varvec{Y} =\varvec{\Sigma }^{-1/2} (\varvec{Z} - \varvec{\mu })\) and \(\otimes\) denotes the Kronecker product. The following lemmas, extracted from Kim and Mallick (2003), allow us to obtain explicit formulas for \(\varvec{S}(\varvec{Y})\). In particular, Fig. 11 leads us to note the interaction between the degrees of freedom and the skewness parameter on the coefficient \(\beta _{1,p}\) proposed by Mardia (1970). In fact, as \(\nu\) grows, it has less impact on \(\beta _{1,p}\).
Lemma 6
If \(\varvec{Y} \sim \mathsf {SN}_p(\varvec{0},\varvec{\Omega },\varvec{\gamma })\), then
where \(\varvec{\delta } = \varvec{\Omega \gamma }/\sqrt{1 + \tau ^2}\). In addition, if \(\varvec{\gamma } = \varvec{0}\), that is \(\varvec{Y} \sim \mathsf {N}_p(\varvec{0},\varvec{\Omega })\), then \(\beta _{1,p} = 0\).
Lemma 7
If \(\varvec{Y} \sim \mathsf {St}_p(\varvec{0},\varvec{\Omega },\varvec{\gamma },\nu )\), then
where \(\alpha (\nu ) = \sqrt{\nu /\pi }\Gamma ((\nu -1)/2)/\Gamma (\nu /2)\) and \(\varvec{\delta } = \varvec{\Omega \gamma }/\sqrt{1 + \tau ^2}\). In addition, if \(\varvec{\gamma } = \varvec{0}\), that is \(\varvec{Y} \sim \mathsf {St}_p(\varvec{0},\varvec{\Omega },\nu )\), then \(\beta _{1,p} = 0\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Osorio, F., Galea, M., Henríquez, C. et al. Addressing non-normality in multivariate analysis using the t-distribution. AStA Adv Stat Anal 107, 785–813 (2023). https://doi.org/10.1007/s10182-022-00468-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-022-00468-2