Addressing non-normality in multivariate analysis using the t-distribution

Osorio, Felipe; Galea, Manuel; Henríquez, Claudio; Arellano-Valle, Reinaldo

doi:10.1007/s10182-022-00468-2

Addressing non-normality in multivariate analysis using the t-distribution

Original Paper
Published: 21 January 2023

Volume 107, pages 785–813, (2023)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

194 Accesses
3 Altmetric
Explore all metrics

Abstract

The main aim of this paper is to propose a set of tools for assessing non-normality taking into consideration the class of multivariate t-distributions. Assuming second moment existence, we consider a reparameterized version of the usual t distribution, so that the scale matrix coincides with covariance matrix of the distribution. We use the local influence procedure and the Kullback–Leibler divergence measure to propose quantitative methods to evaluate deviations from the normality assumption. In addition, the possible non-normality due to the presence of both skewness and heavy tails is also explored. Our findings based on two real datasets are complemented by a simulation study to evaluate the performance of the proposed methodology on finite samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

Multivariate Data Analysis: Its Approach, Evolution, and Impact

References

Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions. Dover, New York (1970)
Google Scholar
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley, New York (2003)
Google Scholar
Arellano-Valle, R.B., Contreras-Reyes, J., Genton, M.: Shannon entropy and mutual information for multivariate skew-elliptical distributions. Scandinavian J. Stat. 40, 42–62 (2012)
Article MathSciNet Google Scholar
Arellano-Valle, R.B., Ferreira, C.S., Genton, M.G.: Scale and shape mixtures of multivariate skew-normal distributions. J. Multivariate Anal. 166, 98–110 (2018)
Article MathSciNet Google Scholar
Azzalini, A., Genton, M.G.: Robust likelihood methods based on the skew-$t$ and related distributions. Int. Stat. Rev. 76, 106–129 (2008)
Article Google Scholar
Bolfarine, H., Galea, M.: On structural comparative calibration under a $t$-model. Comput. Stat. 11, 63–85 (1996)
MathSciNet Google Scholar
Bodnar, T., Gupta, A.K., Parolya, N.: On the strong convergence of the optimal linear shrinkage estimator for large dimensional covariance matrix. J. Multivariate Anal. 132, 215–228 (2014)
Article MathSciNet Google Scholar
Contreras-Reyes, J., Arellano-Valle, R.: Kullback-Leibler divergence measure for multivariate skew-normal distributions. Entropy 14, 1606–1626 (2012)
Article MathSciNet Google Scholar
Cook, R.D.: Assessment of local influence (with discussion). J. R. Stat. Soc. B 48, 133–169 (1986)
Google Scholar
Dykstra, R.L.: Establishing the positive definiteness of the sample covariance matrix. Ann. Math. Stat. 41, 2153–2154 (1970)
Article Google Scholar
Fang, K.T., Zhang, Y.T.: Generalized Multivariate Analysis. Springer, Berlin (1990)
Google Scholar
Feng, D., Baumgartner, R., Svetnik, V.: A robust bayesian estimate of the concordance correlation coefficient. J. Biopharm. Stat. 25, 490–507 (2015)
Article Google Scholar
Fiorentini, G., Sentana, E., Calzolari, G.: Maximum likelihood estimation and inference in multivariate conditionally heteroscedastic dynamic regression models with Student $t$ innovations. J. Bus. Econ. Stat. 21, 532–546 (2003)
Article MathSciNet Google Scholar
Galea, M., Cademartori, D., Curci, R., Molina, A.: Robust inference in the capital asset pricing model using the multivariate $t$-distribution. J. Risk Financ. Manage. 13, 123 (2020)
Article Google Scholar
Gao, J., Zhang, B.: Estimation of seismic wavelets based on the multivariate scale mixture of gaussian model. Entropy 12, 14–33 (2010)
Article Google Scholar
Gómez-Villegas, M.A., Gómez-Sánchez-Manzano, E., Maín, P., Navarro, H.: The effect of non-normality in the Power Exponential distribution. In: Pardo, L., Balakrihnan, N., Gil, M.A. (eds.) Modern Mathematical Tools and Techniques in Capturing Complexity, pp. 119–129. Springer-Verlag, Berlin (2011)
Chapter Google Scholar
Gupta, A.K.: Multivariate skew $t$-distribution. Statistics 37, 359–363 (2003)
Article MathSciNet Google Scholar
Gupta, A.K., Varga, T., Bodnar, T.: Elliptically Contoured Models in Statistics and Portfolio Theory, 2nd edn. Springer, New York (2013)
Book Google Scholar
Härdle, W.K., Simar, L.: Applied Multivariate Statistical Analysis, 3rd edn. Springer, New York (2012)
Book Google Scholar
Kent, J.T., Tyler, D.E., Vardi, Y.: A curious likelihood identity for the multivariate $t$-distribution. Commun. Stat. Simul. Comput. 23, 441–453 (1994)
Article MathSciNet Google Scholar
Kent, J.T., Tyler, D.E.: Redescending $M$-estimates of multivariate location and scatter. Ann. Stat. 19, 2102–2119 (1991)
Article MathSciNet Google Scholar
Kim, H.M., Mallick, B.K.: Moments of random vectors with skew $t$ distribution and their quadratic forms. Stat. Prob. Lett. 63, 417–423 (2003)
Article MathSciNet Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Article MathSciNet Google Scholar
Lange, K., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the $t$ distribution. J. Am. Stat. Assoc. 84, 881–896 (1989)
MathSciNet Google Scholar
Leal, C., Galea, M., Osorio, F.: Assessment of local influence for the analysis of agreement. Biometrical J. 61, 955–972 (2019)
Article MathSciNet Google Scholar
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88, 365–411 (2004)
Article MathSciNet Google Scholar
Ledoit, O., Wolf, M.: Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Stat. 48, 3043–3065 (2020)
Article MathSciNet Google Scholar
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester (1999)
Google Scholar
Mardia, K.V.: Measures of multivariate skewness and kurtosis with applications. Biometrika 36, 519–530 (1970)
Article MathSciNet Google Scholar
Mardia, K.V.: Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhyā Ser. B 36, 115–128 (1974)
MathSciNet Google Scholar
Maronna, R.A.: Robust $M$-estimators of multivariate location and scatter. Ann. Stat. 4, 51–67 (1976)
Article MathSciNet Google Scholar
Poon, W., Poon, Y.S.: Conformal normal curvature and assessment of local influence. J. R. Stat. Soc. B 61, 51–61 (1999)
Article MathSciNet Google Scholar
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (2009)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)
Article MathSciNet Google Scholar
Song, P.X.K., Zhang, P., Qu, A.: Maximum likelihood inference in robust linear mixed-effects models using the multivariate $t$ distributions. Stat. Sin. 17, 929–943 (2007)
MathSciNet Google Scholar
Sutradhar, B.C.: Score test for the covariance matrix of elliptical $t$-distribution. J. Multivariate Anal. 46, 1–12 (1993)
Article MathSciNet Google Scholar
Svetnik, V., Ma, J., Soper, K.A., Doran, S., Renger, J.J., Deacon, S., Koblan, K.S.: Evaluation of automated and semi-automated scoring of polysomnographic recordings from a clinical trial using zolpidem in the treatment of insomnia. Sleep 30, 1562–1574 (2007)
Article Google Scholar
Wilson, E.B., Hilferty, M.M.: The distribution of chi-square. Proc. Nat. Acad. Sci. United States Am. 17, 684–688 (1931)
Article Google Scholar
Zhu, H.T., Lee, S.Y.: Local influence for incomplete-data models. J. R. Stat. Soc. B 63, 111–126 (2001)
Article MathSciNet Google Scholar
Zhu, H., Ibrahim, J.G., Lee, S., Zhang, H.: Perturbation selection and influence measures in local influence analysis. Ann. Stat. 35, 2565–2588 (2007)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by Comisión Nacional de Investigación Científica y Tecnológica, FONDECYT Grants 1140580 and 1150325. Authors are grateful for the valuable comments and suggestions made by the associate editor and anonymous reviewers who, as well as Carla Leal and Ronny Vallejos, who careful read the initial version of the manuscript, allowed improvement to the paper.

Author information

Authors and Affiliations

Departamento de Matemática, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso, Chile
Felipe Osorio
Departamento de Estadística, Pontificia Universidad Católica de Chile, Avenida Vicuña Mackena 4860, Santiago, Chile
Manuel Galea, Claudio Henríquez & Reinaldo Arellano-Valle

Authors

Felipe Osorio
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Galea
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Henríquez
View author publications
You can also search for this author in PubMed Google Scholar
Reinaldo Arellano-Valle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe Osorio.

Ethics declarations

Conflict of interest

The authors have declared no conflict of interest.

Supplementary information

This material is subdivided into two sections. First, we present basic properties of the multivariate t-distribution introduced by Sutradhar (1993). Then, a detailed description of the maximum likelihood estimation procedure considering an EM algorithm is provided.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 286 KB)

Appendices

Appendix A: The expected information matrix

We use the results shown in the Supplementary Material and note that the Fisher information matrix for $\varvec{\theta }$ can be written as

$$\begin{aligned} \varvec{{\mathcal {I}}}(\varvec{\theta }) = \frac{1}{n}\sum _{i=1}^n {{\,\mathrm{E}\,}}\{\varvec{U}_i (\varvec{\theta }) \varvec{U}_i^\top (\varvec{\theta })\}, \end{aligned}$$

where the score function $\varvec{U}_i(\varvec{\theta }) =(\varvec{U}_i^\top (\varvec{\mu }),\varvec{U}_i^\top (\varvec{\phi }), U_i(\eta ))^\top$ associated to the ith component of the log-likelihood, with $i=1,\dots ,n$, is defined in Equations (4)-(6), and the expected value ${{\,\mathrm{E}\,}}(\cdot )$ is taken with respect to the density function in (1). Next, we obtain each of the blocks of the Fisher information matrix reported in Equation (7).

From the score functions (4) and (5), it follows that

$$\begin{aligned} {{\,\mathrm{E}\,}}\{\varvec{U}_i(\varvec{\mu })\varvec{U}_i^\top (\varvec{\mu })\}&= c_\mu (\eta )\varvec{\Sigma }^{-1},\\ {{\,\mathrm{E}\,}}\{\varvec{U}_i(\varvec{\phi })\varvec{U}_i^\top (\varvec{\phi })\}&= \frac{1}{4}\varvec{D}_p^\top \left\{ 2c_\phi (\eta ) (\varvec{\Sigma }^{-1}\otimes \varvec{\Sigma }^{-1})\varvec{N}_p\right. \\&\quad \left. + (c_\phi (\eta ) - 1)({{\,\mathrm{vec}\,}}\varvec{\Sigma }^{-1}) ({{\,\mathrm{vec}\,}}\varvec{\Sigma }^{-1})^\top \right\} \varvec{D}_p, \end{aligned}$$

where $c_\mu (\eta )=c_\phi (\eta )/(1-2\eta )$, $c_\phi (\eta )=(1+p\eta )/(1+(p+2)\eta )$. Note that $c_\mu (\eta )$ and $c_\phi (\eta )\rightarrow 1$ when $\eta \rightarrow 0$. On the other hand, we have that $\varvec{N}_p\varvec{D}_p=\varvec{D}_p$ (see Magnus and Neudecker 1999), which produces the expressions corresponding to the normal case.

Therefore, it is clear that

$$\begin{aligned} \frac{\partial v_i}{\partial \eta } = (1-2\eta )^{-3}\left\{ (p+2) (1-2\eta )q_i^{-1} - (1 + \eta p)q_i^{-2}\delta _i^2\right\} , \end{aligned}$$

with $q_i = 1+c(\eta )\delta _i^2$. Thus, we obtain

$$\begin{aligned} \frac{\partial \varvec{U}_i(\varvec{\mu })}{\partial \eta }&= (1-2\eta )^{-3}\varvec{\Sigma }^{-1} \left\{ (p+2)(1 - 2\eta ) q_i^{-1}\varvec{Z}_i - (1 + \eta p)q_i^{-2}\delta _i^2\varvec{Z}_i\right\} , \\ \frac{\partial \varvec{U}_i(\varvec{\phi })}{\partial \eta }&=\frac{1}{2}(1 - 2\eta )^{-3}\varvec{D}_p^\top {{\,\mathrm{vec}\,}}\left\{ \varvec{\Sigma }^{-1}\left( (p+2)(1 - 2\eta )q_i^{-1}\varvec{Z}_i\varvec{Z}_i^{\top }\right. \right. \\&\quad \left. \left. - (1 + \eta p)q_i^{-2}\delta _i^2\varvec{Z}_i\varvec{Z}_i^{\top }\right) \varvec{\Sigma }^{-1}\right\} . \end{aligned}$$

By applying Lemmas 3 and 5 from Supplementary Material, it follows that

$$\begin{aligned} {{\,\mathrm{E}\,}}\left\{ \frac{\partial \varvec{U}_i(\varvec{\mu })}{\partial \eta }\right\}&= \varvec{0}, \\ {{\,\mathrm{E}\,}}\left\{ \frac{\partial \varvec{U}_i(\varvec{\phi })}{\partial \eta }\right\}&= \frac{c(\eta )(p+2)}{(1 + \eta p)(1 + (p+2)\eta )}\varvec{D}_p^\top {{\,\mathrm{vec}\,}}(\varvec{\Sigma }^{-1}), \end{aligned}$$

for $i=1,\dots ,n$. The score function for $\eta$ can be written as,

$$\begin{aligned} U_i(\eta )&= \frac{1}{2\eta ^2}\left\{ pc(\eta ) + \psi \left( \frac{1}{2\eta }\right) -\psi \left( \frac{1+p\eta }{2\eta }\right) - \frac{1+p\eta }{1-2\eta } \frac{c(\eta )\delta _i^2}{1 + c(\eta )\delta _i^2} +\log (1+c(\eta )\delta _i^2)\right\} , \\&= \frac{1}{2\eta ^2}\left\{ \log (1+ Q_{i\eta }) -\left( \psi \left( \frac{1+p\eta }{2\eta }\right) - \psi \left( \frac{1}{2\eta }\right) \right) -\left( \frac{1+p\eta }{1-2\eta }\frac{Q_{i\eta }}{1 + Q_{i\eta }} - pc(\eta )\right) \right\} , \end{aligned}$$

where $Q_{i\eta } = c(\eta ) \delta _i^2\sim \chi _p^2/\chi _{1/\eta }^2$, ${{\,\mathrm{E}\,}}\{Q_{i\eta } (1+Q_{i\eta })^{-1}\}=\frac{p\eta }{1+p\eta }$ and ${{\,\mathrm{E}\,}}\{\log (1 + Q_{i\eta })\}=\psi \left( \frac{1+p\eta }{2\eta }\right) -\psi \left( \frac{1}{2\eta }\right)$. Let $U_1 =\log (1+Q_{i\eta })$, $U_2 =\frac{1+p\eta }{1 - 2\eta }\frac{Q_{i\eta }}{1+Q_{i\eta }}$, $\overline{U}_1 = U_1 - {{\,\mathrm{E}\,}}(U_1)$ and $\overline{U}_2 = U_2 -{{\,\mathrm{E}\,}}(U_2)$. Then,

$$\begin{aligned} {{\,\mathrm{E}\,}}\{U_i(\eta )\}&= {{\,\mathrm{E}\,}}\{(U_1 - {{\,\mathrm{E}\,}}(U_1)) - (U_2 - {{\,\mathrm{E}\,}}(U_2))\} = 0,\\ {{\,\mathrm{var}\,}}\{U_i(\eta )\}&= \frac{1}{4\eta ^4}\{{{\,\mathrm{E}\,}}(\overline{U}_1^2) -2{{\,\mathrm{E}\,}}(\overline{U}_1 \overline{U}_2) + {{\,\mathrm{E}\,}}(\overline{U}_2^2)\} \\&= \frac{1}{4\eta ^4}\{{{\,\mathrm{var}\,}}(\overline{U}_1) -2{{\,\mathrm{Cov}\,}}(\overline{U}_1,\overline{U}_2) + {{\,\mathrm{var}\,}}(\bar{U}_2)\} \\&= \frac{1}{4\eta ^4}\{{{\,\mathrm{E}\,}}(U_1^2) - {{\,\mathrm{E}\,}}^2(U_1) - 2({{\,\mathrm{E}\,}}(U_1 U_2) -{{\,\mathrm{E}\,}}(U_1){{\,\mathrm{E}\,}}(U_2)) + {{\,\mathrm{E}\,}}(U_2^2)-{{\,\mathrm{E}\,}}^2(U_2)\}. \end{aligned}$$

Using the fact that $\psi (x+1) = \psi (x)+1/x$ we have

$$\begin{aligned} {{\,\mathrm{E}\,}}(U_1^2)&= {{\,\mathrm{E}\,}}\{(\log (1+Q_{i\eta }))^2\} = {{\,\mathrm{E}\,}}^2(U_1) - \psi '\left( \frac{1+p\eta }{2\eta }\right) + \psi '\left( \frac{1}{2\eta }\right) , \\ {{\,\mathrm{E}\,}}\{U_1 U_2)&= \frac{1+p\eta }{1-2\eta }{{\,\mathrm{E}\,}}\{Q_{i\eta } (1 + Q_{i\eta })^{-1}\log (1 + Q_{i\eta })\} \\&= \left\{ {{\,\mathrm{E}\,}}(U_1) + \frac{2\eta }{1+p\eta }\right\} {{\,\mathrm{E}\,}}(U_2), \\ {{\,\mathrm{E}\,}}(U_2^2)&= \left( \frac{1+p\eta }{1-2\eta }\right) ^2 {{\,\mathrm{E}\,}}\{Q_{i\eta }^2(1+Q_{i\eta })^{-2}\} \\&= \frac{p+2}{p}\frac{1+p\eta }{1+(p+2)\eta }{{\,\mathrm{E}\,}}(U_2)^2. \end{aligned}$$

Finally, we have that the expected information in relation to $\eta$ is given by,

$$\begin{aligned} {{\,\mathrm{var}\,}}\{U_i(\eta )\}&= \frac{1}{4\eta ^4}\left\{ -\psi '\left( \frac{1+p\eta }{2\eta }\right) + \psi '\left( \dfrac{1}{2\eta }\right) - \frac{4\eta }{1+p\eta }{{\,\mathrm{E}\,}}(U_2) \right. \\&\quad \left. + \frac{p+2}{p}\frac{1+p\eta }{1+(p+2)\eta }{{\,\mathrm{E}\,}}(U_2)^2 - {{\,\mathrm{E}\,}}(U_2)^2\right\} \\&= \frac{1}{4\eta ^4}\left\{ \psi '\left( \frac{1}{2\eta }\right) - \psi ' \left( \frac{1+p\eta }{2\eta }\right) + 2pc(\eta )^2\left( \frac{4(p+2) \eta ^2 - p\eta -1}{(1+p\eta )(1+(p+2)\eta )}\right) \right\} . \end{aligned}$$

From the expansion (Abramowitz and Stegun 1970, Sec. 6.4.12),

$$\begin{aligned} \psi '(x)&= \frac{1}{x} + \frac{1}{2x^2} + \frac{1}{6x^3} + O\left( \frac{1}{x^5}\right) \quad \text {as} \quad x\rightarrow \infty , \\ (1+ax)^{-k}&= 1-k a x + \frac{k(k+1)}{2}a^2x^2 -\frac{k(k+1)(k+2)}{6}a^3x^3 + O(x^4) \quad \text {as} \quad x\rightarrow 0, \end{aligned}$$

we find as $\eta \rightarrow 0$ that,

$$\begin{aligned} \psi '\left( \frac{1}{2\eta }\right) - \psi '\left( \frac{1+p\eta }{2\eta }\right)&= 2\eta +2\eta ^2 + \frac{4}{3}\eta ^3 - 2\eta (1+p\eta )^{-1} - 2\eta ^2(1+p\eta )^{-2} \\&\quad - \frac{4}{3}\eta ^3(1+p\eta )^{-3} + O(\eta ^5) \\&= 2p\eta ^2 - (2p^2-4p)\eta ^3 + (2p^3-6p^2+4p)\eta ^4 + O(\eta ^5). \end{aligned}$$

Similarly,

$$\begin{aligned} 2pc(\eta )^2&\left( \frac{4(p+2)\eta ^2 - p\eta -1}{(1+p\eta )(1 + (p + 2)\eta )}\right) = \frac{2p\eta ^2(4(p+2)\eta ^2 - p\eta - 1)}{(1 - 2\eta )^2(1 + p\eta )(1 + (p+2)\eta )} \\&= (8p(p+2)\eta ^4 - 2p^2\eta ^3 - 2p\eta ^2)(1 + 4\eta + 12\eta ^2 + O(\eta ^3)) \\&\quad \times (1 - p\eta + p^2\eta ^2 + O(\eta ^3))(1 - (p+2)\eta - (p+2)^2\eta ^2 O(\eta ^3)) \\&= -2p\eta ^2 + (2p^2 - 4p)\eta ^3 + (2p^3+24p^2+16p)\eta ^4 + O(\eta ^5). \end{aligned}$$

Hence,

$$\begin{aligned} {{\,\mathrm{var}\,}}\{U_i(\eta )\} = \frac{1}{4}(4p^3+18p^2+20p) + O(\eta ) =\frac{p(p+2)(2p+5)}{2} + O(\eta ). \end{aligned}$$

Appendix B: Non-normality due to asymmetry

Another source of non-normality is the possible asymmetry present in the observations. Shannon entropy, Kullback–Leibler divergence and mutual information for multivariate skew-elliptical distributions have been considered in the literature, see for instance Arellano-Valle et al. (2012) and Contreras-Reyes and Arellano-Valle (2012). We summarize some of these results for the multivariate skew normal and skew t distributions below.

Following Arellano-Valle et al. (2012), we say that a random vector $\varvec{Z}\in {\mathbb {R}}^p$ has a skew-normal distribution with location vector $\varvec{\xi }\in {\mathbb {R}}^p$, dispersion matrix $\varvec{\Omega } > 0$ and shape/skewness parameter $\varvec{\gamma }\in {\mathbb {R}}^p$, denoted by $\varvec{Z}\sim \mathsf {SN}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma })$, if its probability density function is

$$\begin{aligned} f(\varvec{z}) = 2\phi _p(\varvec{z};\varvec{\xi },\varvec{\Omega })\,\Phi \{\varvec{\gamma }^\top (\varvec{z}-\varvec{\xi })\}, \qquad \varvec{z} \in {\mathbb {R}}^p, \end{aligned}$$

(B.1)

where

$$\begin{aligned} \phi _p(\varvec{z};\varvec{\xi },\varvec{\Omega }) = (2\pi )^{-p/2}\vert \varvec{\Omega }\vert ^{-1/2}\exp (-\delta _{\mathsf{skew}}^2/2), \end{aligned}$$

is the probability density function of the p-variate $\mathsf {N}_p(\varvec{\xi },\varvec{\Omega })$ distribution, $\Phi (\cdot )$ is the univariate $\mathsf {N}(0,1)$ cumulative distribution function and $\delta _{\mathsf{skew}}^2 =(\varvec{z}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{z}-\varvec{\xi }) \sim \chi ^2(p)$. The vector of means and the covariance matrix of $\varvec{Z}$ are given, respectively, by

$$\begin{aligned} \varvec{\mu }_{\mathsf{SN}} = \varvec{\xi } + \sqrt{\frac{2}{\pi }}\varvec{\delta }, \qquad \text {and} \qquad \varvec{\Sigma }_{\mathsf{SN}} = \varvec{\Omega } - \frac{2}{\pi }\varvec{\delta }\varvec{\delta }^\top , \end{aligned}$$

where, $\varvec{\delta } = \varvec{\Omega \gamma }/\sqrt{1 + \tau ^2}$ and $\tau ^2=\varvec{\gamma }^\top \varvec{\Omega \gamma }$.

We say that a random vector $\varvec{Z}\in {\mathbb {R}}^p$ has a skew-t distribution with location vector $\varvec{\xi }\in {\mathbb {R}}^p$, dispersion matrix $\varvec{\Omega }\in {\mathbb {R}}^{p \times p}$, shape/skewness parameter $\varvec{\gamma }\in {\mathbb {R}}^p$ and $\nu > 0$ degrees of freedom, denoted by $\varvec{Z} \sim \mathsf {St}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma },\nu )$, if its probability density function is given by

$$\begin{aligned} f(\varvec{z}) = 2t_p(\varvec{z};\varvec{\xi },\varvec{\Omega },\nu )\,T \left( \sqrt{\frac{\nu + p}{\nu + \delta _{\mathsf{skew}}^2}} \, \varvec{\gamma }^\top (\varvec{z} - \varvec{\xi }); \nu + p\right) , \end{aligned}$$

(B.2)

where

$$\begin{aligned} t_p(\varvec{z};\varvec{\xi },\varvec{\Omega },\nu ) = \frac{\Gamma \left( \frac{\nu + p}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) (\nu \pi )^{p/2}} \vert \varvec{\Omega }\vert ^{-1/2}\left( 1 + \frac{1}{\nu } \delta _{\mathsf{skew}}^2\right) ^{-(\nu +p)/2},\qquad \varvec{z}\in {\mathbb {R}}^p, \end{aligned}$$

is the probability density function of the p-variate $t_p(\varvec{\xi },\varvec{\Omega },\nu )$ distribution, $\delta _{\mathsf{skew}}^2 = (\varvec{z} - \varvec{\xi })^\top \varvec{\Omega }^{-1}(\varvec{z} - \varvec{\xi })/p \sim F(p,\nu )$ and $T(x;\nu + p)$ is the $T_1(0,1,\nu +p)$ cumulative distribution function (see, for instance Azzalini and Genton 2008; Arellano-Valle et al. 2012, for details).

If $\varvec{Z}\sim \mathsf {St}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma },\nu )$ then the vector of means and the covariance matrix of $\varvec{Z}$ is given by

$$\begin{aligned} \varvec{\mu }_{\mathsf{St}}&= \varvec{\xi } + \alpha (\nu )\varvec{\delta }, \quad \nu> 1 \\ \varvec{\Sigma }_{\mathsf{St}}&= \frac{\nu }{\nu -2}\varvec{\Omega } - \{\alpha (\nu )\}^2 \varvec{\delta }\varvec{\delta }^\top , \quad \nu > 2, \end{aligned}$$

where $\alpha (\nu ) =\{\Gamma ((\nu -1)/2)/\Gamma (\nu /2)\} \sqrt{\nu /\pi }$. Note that $\alpha (\nu )\rightarrow \sqrt{2/\pi }$ as $\nu \rightarrow \infty$, and we obtain the results for the skew-normal distribution given above.

Arellano-Valle et al. (2012) show that for the skew-normal and skew-t distributions the Shannon entropy has explicit form and given in the following lemmas.

Lemma 4

If $\varvec{X}\sim \mathsf {SN}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma })$ and $\varvec{Y}\sim \mathsf {St}_p(\varvec{\xi }, \varvec{\Omega },\varvec{\gamma },\nu )$, then the Shannon entropy is given by

(i)
$H(\varvec{X}) = \frac{1}{2}\log \vert \varvec{\Omega }\vert +\frac{p}{2}(1+ \log 2\pi ) - {{\,\mathrm{E}\,}}[\log \{2\Phi (\tau W)\}]$,
(ii)
$H(\varvec{Y}) = \frac{1}{2}\log \vert \varvec{\Omega }\vert -\log \Gamma \left( \frac{\nu + p}{2}\right) + \log \Gamma \left( \frac{\nu }{2}\right) + \frac{p}{2}\log (\nu \pi ) +\frac{\nu + p}{2}\left\{ \psi \left( \frac{\nu + p}{2}\right) -\psi \left( \frac{\nu }{2}\right) \right\} - {{\,\mathrm{E}\,}}[\log \{2T(\tau W^*; \nu +p)\}]$,

with $W \sim \mathsf {SN}(0,1,\tau )$, $\tau ^2 =\varvec{\gamma }^\top \varvec{\Omega \gamma }$; $W^* = \sqrt{\nu + p} \,W_{\mathsf{St}}/\sqrt{\nu + p-1 + W_{\mathsf{St}}^2}$ where $W_{\mathsf{St}} \sim \mathsf {St}(0,1,\tau ,\nu +p-1)$.

Lemma 5

Let $\varvec{Z}\sim \mathsf {N}_p(\varvec{\mu },\varvec{\Sigma })$. If $\varvec{X}\sim \mathsf {SN}_p(\varvec{\xi }, \varvec{\Omega },\varvec{\gamma })$, $\varvec{Y}\sim \mathsf {St}_p(\varvec{\xi },\varvec{\Omega },\varvec{\gamma },\nu )$ then the negentropy of $\varvec{X}$ and $\varvec{Y}$ are given, respectively, by,

(i)
$H_N(\varvec{X}) = \frac{1}{2}\log \vert \varvec{\Sigma }\vert -\frac{1}{2}\log \vert \varvec{\Omega }\vert +{{\,\mathrm{E}\,}}[\log \{2\Phi (\tau W)\}]$, and
(ii)
$H_N(\varvec{Y}) = \frac{1}{2}\log \vert \varvec{\Sigma }\vert +\frac{p}{2}(1+\log 2\pi ) - \frac{1}{2}\log \vert \varvec{\Omega }\vert +\log \Gamma \left( \frac{\nu + p}{2}\right) -\log \Gamma \left( \frac{\nu }{2}\right) - \frac{p}{2}\log (\nu \pi )- \frac{\nu + p}{2} \left\{ \psi \left( \frac{\nu + p}{2}\right) - \psi \left( \frac{\nu }{2}\right) \right\} + {{\,\mathrm{E}\,}}[\log \{2T(\tau W^*; \nu +p)\}]$.

Mardia (1970) introduced one of the popular and commonly used measures of multivariate skewness of an arbitrary p-dimensional random vector $\varvec{Z}$ with mean vector $\varvec{\mu }$ and covariance matrix $\varvec{\Sigma }$. Mardia’s skewness coefficient is defined as,

$$\begin{aligned} \beta _{1,p} = {{\,\mathrm{E}\,}}[\{(\varvec{Z} - \varvec{\mu })^\top \varvec{\Sigma }^{-1}(\varvec{Z} - \varvec{\mu })\}^3], \end{aligned}$$

which can be expressed as $\beta _{1,p} ={{\,\mathrm{tr}\,}}\{\varvec{S}^\top (\varvec{Y})\varvec{S}(\varvec{Y})\}$, where $\varvec{S}(\varvec{Y}) ={{\,\mathrm{E}\,}}(\varvec{Y}\otimes \varvec{Y}^\top \otimes \varvec{Y})$, with $\varvec{Y} =\varvec{\Sigma }^{-1/2} (\varvec{Z} - \varvec{\mu })$ and $\otimes$ denotes the Kronecker product. The following lemmas, extracted from Kim and Mallick (2003), allow us to obtain explicit formulas for $\varvec{S}(\varvec{Y})$. In particular, Fig. 11 leads us to note the interaction between the degrees of freedom and the skewness parameter on the coefficient $\beta _{1,p}$ proposed by Mardia (1970). In fact, as $\nu$ grows, it has less impact on $\beta _{1,p}$.

Lemma 6

If $\varvec{Y} \sim \mathsf {SN}_p(\varvec{0},\varvec{\Omega },\varvec{\gamma })$, then

$$\begin{aligned} \varvec{S}(\varvec{Y}) = \sqrt{2/\pi }[\varvec{\delta }\otimes \varvec{\Omega } +{{\,\mathrm{vec}\,}}(\varvec{\Omega })\varvec{\delta }^\top + (\varvec{I}_p\otimes \varvec{\delta })\varvec{\Omega } - \varvec{\delta }\otimes \varvec{\delta \delta }^\top ], \end{aligned}$$

where $\varvec{\delta } = \varvec{\Omega \gamma }/\sqrt{1 + \tau ^2}$. In addition, if $\varvec{\gamma } = \varvec{0}$, that is $\varvec{Y} \sim \mathsf {N}_p(\varvec{0},\varvec{\Omega })$, then $\beta _{1,p} = 0$.

Lemma 7

If $\varvec{Y} \sim \mathsf {St}_p(\varvec{0},\varvec{\Omega },\varvec{\gamma },\nu )$, then

$$\begin{aligned} \varvec{S}(\varvec{Y}) = \frac{\alpha (\nu )\nu }{\nu - 3}[\varvec{\delta }\otimes \varvec{\Omega } + {{\,\mathrm{vec}\,}}(\varvec{\Omega })\varvec{\delta }^\top + (\varvec{I}_p\otimes \varvec{\delta })\varvec{\Omega } - \varvec{\delta }\otimes \varvec{\delta \delta }^\top ], \end{aligned}$$

where $\alpha (\nu ) = \sqrt{\nu /\pi }\Gamma ((\nu -1)/2)/\Gamma (\nu /2)$ and $\varvec{\delta } = \varvec{\Omega \gamma }/\sqrt{1 + \tau ^2}$. In addition, if $\varvec{\gamma } = \varvec{0}$, that is $\varvec{Y} \sim \mathsf {St}_p(\varvec{0},\varvec{\Omega },\nu )$, then $\beta _{1,p} = 0$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Osorio, F., Galea, M., Henríquez, C. et al. Addressing non-normality in multivariate analysis using the t-distribution. AStA Adv Stat Anal 107, 785–813 (2023). https://doi.org/10.1007/s10182-022-00468-2

Download citation

Received: 19 November 2021
Accepted: 22 December 2022
Published: 21 January 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10182-022-00468-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Addressing non-normality in multivariate analysis using the t-distribution

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Multivariate Data Analysis: Its Approach, Evolution, and Impact

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary information

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 286 KB)

Appendices

Appendix A: The expected information matrix

Appendix B: Non-normality due to asymmetry

Lemma 4

Lemma 5

Lemma 6

Lemma 7

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Addressing non-normality in multivariate analysis using the t-distribution

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Multivariate Data Analysis: Its Approach, Evolution, and Impact

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary information

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 286 KB)

Appendices

Appendix A: The expected information matrix

Appendix B: Non-normality due to asymmetry

Lemma 4

Lemma 5

Lemma 6

Lemma 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation