1 Introduction

Bayesian methods, since their introduction in Econometrics (Bernardo and Smith 1994; Poirier 1995; Zellner 1971), have proposed a rigorous methodological alternative to classical estimations. However, the applicability of the Bayesian scheme is inherently a consequence of the computational advances of the last decades: the enhancement of computing machines on the one hand, the introduction of Markov Chain Monte Carlo (MCMC) simulators, on the other, have revolutionized the field, making the derivation of previously intractable formalization possible. In this sense Bayesian econometrics, and more generally Bayesian statistics, are “computationally-driven” sciences.

Given this premise, the availability of efficient statistical and econometric software which makes Bayesian estimation available to the user has been and is still a necessary ingredient for the success of the recipe: the key factor is the balance between the user’s accessibility, flexibility and execution time.

In this paper I introduce a gretl package named BayTool which provides Bayesian estimation and post-estimation tools for widely used econometric models: the gretl ecosystem counts several frequentist methods, which are either natively supported or supplied by means of auxiliary function packages; Bayesian methods are a minority and pertain to only specific estimation aspects such as model averaging with the packages BMA (Błażejowski and Kwiatkowski 2015), BACE (Błazejowski and Kwiatkowski 2018), ParMA (Lucchetti and Pedini 2022), or more recently Vector Autoregressive specifications (the prototype package BVAR, Pedini and Schreiber 2023). However, a Bayesian counterpart for common specifications is missing and BayTool aims to bridge the gap.

The underlying idea is to provide intuitive functions that could mimic the classical gretl commands, guaranteeing a simple solution for different kinds of usage. BayTool provides a graphical interface (GUI), which truly stands out as an excellent medium for users who are new to the gretl software or to the Bayesian paradigm. On the other hand, using the package in scripting mode, which allows to access a higher degree of customization, is equally simple and requires only basic gretl knowledge: the implied workflow is transparent to the user and exploits a syntax which is shared by all included functions. These elements build an ideal workspace for teaching purposes: note that one of the major user base of gretl is inherently academic, with the software employed as a learning-teaching instrument. In adherence with this, BayTool provides several examples from all major Bayesian textbooks such as Chan et al. (2019), Koop (2003) via sample scripts.

Practitioners, however, may also find the package useful for more advanced uses: regularization methods such as the Bayesian LASSO by Park and Casella (2008) are provided; efficient computational schemes such as the Held and Holmes (2006) method for probit simulation are included; post-estimation and MCMC diagnostics are available; finally, parallelization is possible and is handled via the native Message Passing Interface (MPI) support.

The rest of the article is organized as follows: Sect. 2 reviews the competing software and function packages; Sect. 3 sketches the preliminary background of the models here considered; Sect. 4 describes the package and the included functions; Sect. 5 provides some replication exercises together with computational analysis; finally Sect. 6 concludes.

2 Review on competing software

The archetype of the Bayesian algorithm is generally a MCMC sampler which simulates pseudo-random numbers from the distribution of interest, namely the posterior.Footnote 1 Depending on the cases, this is often accomplished via Gibbs sampling (Gelfand and Smith 1990) or Metropolis-Hastings (Hastings 1970) methods. The former is computationally more efficient, since posteriors are iteratively sampled from their conditional distributions, but less flexible to the extent that closed-form conditionals impose specific prior choices. The Metropolis-Hastings scheme, instead, removes the limitation: the simulation relies on draws from a candidate distribution, whose proposals are judged via an accept-reject mechanism and updated accordingly. The generality is higher, but the quality of the sampling is heavily dependent on the choice of appropriate candidates, often difficult in practice. More recently, the need for a universal sampler, which could simulate any distribution at hand without sacrificing computational efficiency or accuracy, has led to the introduction of new schemes, namely adaptive Metropolis-Hastings (Haario et al. 2001; Roberts and Rosenthal 2007, 2009), the slice sampling (Neal 2003) and the Hamiltonian or Hybrid Monte Carlo methods (Betancourt 2017; Duane et al. 1987; Girolami and Calderhead 2011; Neal 1994).

This duality between efficiency and flexibility is reflected in the software panorama with the so-called case-specific algorithms on the one hand, and generic all-purposes ones on the other hand. As the name suggests, a case-specific algorithm is strictly dependent on the chosen specification (e.g., linear model or binary model), with priors following only specific distributional forms. This to provide sampling efficiency via the Gibbs method: an example could be the linear model with priors only from Normal-Inverse Gamma distributions (see Sect. 3.1). From a practical perspective, the major requirement for the user is the specification of the prior hyperparameters. Conversely, a generic sampler tries to automatically choose the ideal simulation scheme for each model in usage and employs advanced methods to approximate any given distribution. Priors are not restrained by the specification choices, but the requirements in terms of scripting skills are much more pronounced.Footnote 2

The BUGS project (Lunn et al. 2012), with its declinations WinBUGS (Spiegelhalter et al. 2003) and OpenBUGS (Spiegelhalter et al. 2007), has probably been the first contribution in the direction of proper Bayesian software and a universal sampler. Both WinBUGS and OpenBUGS are written in Component Pascal and allow the user to compute estimation methods via point-and-click windows or via scripting following the BUGS language. This hinges upon some fundamental statements such as the model declaration, which sketches a statistical model in terms of blocks and nodes. Priors can come in any shape posing no restrictions to the user, while sampling is performed with Gibbs sampler when available, Metropolis-Hastings, adaptive schemes, slice sampling in the remaining cases. However, several limitations are in place: the impossibility of defining user-defined functions and the operating system dependency with WinBUGS available only on Windows and OpenBUGS on Linux too, led to the introduction of cross-platform software such as JAGS (Plummer 2017) or Stan (Carpenter et al. 2017; Stan Development Team 2023). Both are written in C++ and overcome the BUGS limitations: JAGS integrates and innovates the predecessor not only making it possible to invoke directly BUGS commands and files, but also introducing new functionalities in the form of internal modules for advanced or “exotic” applications. All the standard samplers together with slice sampling are implemented. Stan, on the other hand, relies on Hamiltonian Monte Carlo with the No-U-Turn Sampler (Hoffman et al. 2014): this scheme can approximate any continuous distribution at hand with the additional benefit of automatically tuning hyperparameters. Differently from JAGS, Stan language is not borrowed from BUGS, even if many similarities are present.

Besides these specific statistical software languages for Bayesian methods, many generic posterior simulators are provided as R packages: Bayesiantools (Hartig et al. 2023), LaplacesDemon (Hall et al. 2021), MCMCpack (Martin et al. 2022), mcmc (Geyer and Johnson 2020) and nimble (de Valpine et al. 2017, 2023) are probably the most notorious examples. Special mentions to those packages that port JAGS or Stan functionalities into R such as greta (Golding et al. 2022), rjags (Plummer et al. 2023), R2jags (Su and Yajima 2021), runjags (Denwood and Plummer 2023) or brms (Bürkner et al. 2023) and rstan (Guo et al. 2023).

As concerns case-specific algorithms, this concurring category has arisen from the necessities of applied scientists such as economists or applied econometricians who do not require the complexity of a universal sampler but prefer to be stuck to simple and efficient routines. Proprietary software such as MATLAB or STATA mostly adheres to this philosophy. MATLAB, for example, provides some internal routines for Bayesian linear models and regularized regressions with specific priors; the same occurs with the downloadable Lesage’s Econometrics Toolbox (Lesage 1999) or Koops’ scripts (Chan et al. 2019; Koop 2003). STATA implements case-specific functions which take advantage of adaptive MCMC schemes. These, in principle, can fit several prior distributions, but the user can also recur to default choices with minimal scripting effort. On the open-source side, R has some dedicated packages such as arm (Gelman et al. 2022) for linear and generalized linear models, bayesm (Rossi 2022) for microeconometric models or monomvn (Gramacy et al. 2023) for Bayesian shrinkage methods. As previously mentioned, Bayesian methods are not natively included in gretl, but function packages have integrated some of these functionalities over time, addressing peculiar user necessities: the packages BMA (Błażejowski and Kwiatkowski 2015), BACE (Błazejowski and Kwiatkowski 2018), ParMA (Lucchetti and Pedini 2022), for instance, add Bayesian model averaging and model selection tools.

BayTool, as a consequence, is the first attempt to provide Bayesian functions of wider applicability in gretl: the package proposes a case-specific approach for Bayesian sampling in common-use econometric models, with dedicated functions that pursue low execution time and simplicity in usage. Finally, to fully understand the benefits of BayTool especially versus competing software, it should be noted that: (i) the sole fact that gretl belongs to the free software ecosystem induces per se a clear distinction with respect to MATLAB and STATA; (ii) the gretl community refers mainly to econometricians and economists who use the software for teaching but also for real-life applications, meaning a distinct audience to address in comparison with more general Bayesian inference software (e.g., BUGS, JAGS, Stan) or the R environment. The case-specific approach finds its intrinsic motivation in a user base who benefits from standardized and specialized routines with few coding prerequisites. Lastly, (iv) Bayesian econometrics novices may find the gretl accessibility appealing.

3 Preliminaries and notation

In this section, I will introduce the mathematical notation used throughout the paper and I will briefly report the statistical characterization of the analyzed models. For major details on the statistical framework I refer to specialized textbooks such as (Chan et al. 2019; Geweke 2005; Koop 2003; Lancaster 2004).

Start by considering a generic econometric model with \(\mathcal {D}\) as the data of a given phenomenon of interest and \(\theta \), the unknown parameter which links the data to the model. Through Bayes’ law,

$$\begin{aligned} P(\theta \vert \mathcal {D}) \propto p(\mathcal {D} \vert \theta ) P(\theta ), \end{aligned}$$

where \(p(\mathcal {D}\vert \theta )\) identifies the likelihood, i.e., the data contribution to \(\theta \); \(P(\theta )\) is the prior, which summarizes the non-data related information, while \(P(\theta \vert \mathcal {D})\) represents the posterior distribution, which embodies all available knowledge about \(\theta \) and therefore is the main object of investigation.

In order to keep the exposition concise, the specifications reported below are just a selection from BayTool included models. These are specifically intended to highlight different aspects of the package: the linear model as the universal benchmark for sampling accuracy; the Bayesian LASSO by Park and Casella (2008) due to the increasing interest in data regularization methods; the probit model for binary choices as a leading example of the latent variable approach.Footnote 3 Posterior computation is executed via Gibbs sampling.

3.1 Linear model

Consider the following linear specification:

$$\begin{aligned} y = X\beta + \varepsilon , \quad \varepsilon \sim N_n(0, \sigma ^2 I_n) \end{aligned}$$
(1)

where y is the \(n \times 1\) vector of i.i.d. observations for the dependent variable; X is a \(n \times k\) matrix of covariates; \(\beta \) the \(k \times 1\) vector of covariate coefficients, \(\varepsilon \) the idiosyncratic error term. The notation \(N_n(0, \sigma ^2I_n)\) identifies a multivariate Normal distribution of dimension n with mean 0 and variance-covariance matrix equal to the product of the parameter \(\sigma ^2\) and the \(n \times n\) identity matrix.Footnote 4 Recalling the previous notation, \(\theta = (\beta , \sigma ^2)\) and \(\mathcal {D} = (y, X)\), where under the canonical linear model assumptions, the dependency from X can be dropped.

Analytical solutions for the posteriors can be retrieved by specifying the so-called Normal-Inverse Gamma (NIG) conjugate prior, with \(P(\beta \vert \sigma ^2) P(\sigma ^2)\) and \(\beta \vert \sigma ^2\) as Normal, \(\sigma ^2\) as Inverse Gamma. In this exposition, however, an independent NIG will be used,

$$\begin{aligned} P(\beta , \sigma ^2)= & {} P(\beta ) P(\sigma ^2),\nonumber \\ \beta\sim & {} N_k(\beta _0, V_0), \end{aligned}$$
(2)
$$\begin{aligned} \sigma ^2 \sim IG(a_0, b_0) \end{aligned}$$
(3)

where \(\beta _0\), \(V_0\), \(a_0\), \(b_0\) are hyperparameters up to the user and \(IG(a_0, b_0)\) indicates an Inverse Gamma with shape \(a_0\) and scale \(b_0\). In the context of linear models, such parameters admit a quite natural interpretation with the j-th element of \(\beta _0\), namely \(\beta _{0,j}\), and its related variance entry \(V_{0,jj}\) being, respectively, the expected marginal effect of variable \(x_j\) on y and its variability, according to the researcher’s a priori view. The matrix \(V_0\) is commonly assumed diagonal due to the practical difficulties in eliciting covariances. The hyperparameters \(a_0\) and \(b_0\), instead, are indirectly derived from the moments of \(\sigma ^2\): the expected variability of the residuals should guide this choice, as reported in Sect. 5.1. Remember that prior information is “external” to the data at disposal and may derive from economic theory, common sense, experience or previous analysis on similar experiments.

Coming back to the framework adopted, the independent setup is not only more flexible than the conjugate, but also one of the most used in practice. The resulting posterior can be handled via a Gibbs sampler which loops through the conditioned posteriors: a Normal distribution for \(\beta \vert \sigma ^2,y\) and an Inverse Gamma for \(\sigma ^2\vert \beta ,y\).

3.2 Bayesian LASSO

Reconsider the linear framework as in Eq. (1): in the context of statistical learning methods, a Bayesian alternative to the LASSO (Tibshirani 1996) is easily obtained using hierarchical priors. Following Park and Casella (2008),

$$\begin{aligned}&\beta \vert \sigma ^2, \tau _1^2, \tau _2^2, ..., \tau _k^2 \sim N_k(0, \sigma ^2D_\tau ), \end{aligned}$$
(4)
$$\begin{aligned}&P(\sigma ^2) = \frac{1}{\sigma ^2}, \end{aligned}$$
(5)
$$\begin{aligned}&P(\tau _j^2) = \frac{\lambda ^2}{2}\exp {\biggl [-\lambda ^2\frac{\tau _j^2}{2}\biggr ]} \end{aligned}$$
(6)

where \(D_\tau \) is a diagonal matrix with entries \(\tau _1^2, \tau _2^2,..., \tau _k^2\), \(\lambda \) is a parameter which acts as a shrinkage factor; \(\sigma ^2\) has the same role as in Sect. 3.1 but with a diffuse prior; finally, \(\tau _1^2, \tau _2^2,..., \tau _k^2 > 0\). Differently from Eqs. (2)–(3), the above prior specification imposes a specific structure to both \(\beta , \sigma ^2\) via the introduction of \(\tau \). In this way, the sole tunable hyperparameter becomes \(\lambda \). To mitigate the sensibility of the results to this, it is customary to elicit an additional prior:

$$\begin{aligned} \lambda ^2 \sim G(r_0, \delta _0) \end{aligned}$$
(7)

where G(ab) identifies a Gamma distribution with shape a and rate b. Equations (4)–(7) fully characterize the Bayesian LASSO and posterior derivation can be accomplished via Gibbs sampling, since conditional posteriors are known: a Normal for the \(\beta \)-conditional, an Inverse Gamma for \(\sigma ^2\) and an Inverse Gaussian for \(\tau _j^2\).

3.3 Binary choice model

Binary choice models are usually described in microeconometrics in terms of latent utility where a given action \(y_i\) is undertaken (\(y_i = 1\)) by the i-th agent if and only if the resulting net utility \(z_i\) is positive.

In symbols,

$$\begin{aligned} z_i&= x_i^T\beta + \varepsilon _i, \quad \varepsilon _i \sim N(0,1) \end{aligned}$$
(8)
$$\begin{aligned} y_i&= {\left\{ \begin{array}{ll} 1&{} z_i>0\\ 0&{} z_i\le 0 \end{array}\right. } \end{aligned}$$
(9)

with the subscript \(i=1, \ldots , n\) identifying the i-th observation. Since \(\varepsilon \) is assumed to be Normal, Eqs. (8)–(9) give rise to a probit model where the parameter of interest is \(\beta \).Footnote 5

Assuming a Normal prior for \(\beta \) as in Eq. (2) leads to a posterior which can be easily retrieved via MCMC methods: the most popular one follows Albert and Chib (1993) and builds a Gibbs sampler from the latent variable z,

$$\begin{aligned} P(\beta , z \vert y) \propto p(y, z \vert \beta )P(\beta ) \end{aligned}$$

then marginalizing for \(\beta \). The result exploits \(\beta \vert z, y\) being Normal as a consequence of the linear model in Eq. (8) and \(z \vert \beta , y\) as a Truncated Normal. The simplicity, however, comes at the cost of heavily auto-correlated draws: BayTool, to counter this fact, proposes a supplementary sampler for probit specifications, i.e., the Held and Holmes (2006) scheme. This is a Gibbs sampler which jointly updates \(\beta \) and z by virtue of the following result:

$$\begin{aligned} P(\beta , z \vert y) = P(\beta \vert z, y)P(z \vert y) \end{aligned}$$

3.4 Diagnostics

Alongside simulation algorithms, BayTool provides several diagnostics tools to measure the efficiency and the quality of MCMC samples: needless to say, monitoring the convergence to the stationary state and the mixing of the chain are key aspects necessary to draw reliable conclusions and inference.

In the literature, several indicators have been proposed ranging from graphical inspections to rigorous statistics. The traceplot, i.e., the time series plot of the MCMC draws, the running mean graph and the visualization of the autocorrelation function (ACF) belong to the first category. Even though immediate to interpret and extremely efficient for detecting issues, virtuous practice generally imposes to couple such measures with the second category: the widely used effective sample size (ESS) by Kass et al. (1998) is an example. ESS indicates the number of independent Monte Carlo iterations required to achieve the same precision as the current MCMC sample. Hypothetically, the closer the ESS is to the actual number of MCMC replicas, the better the chain is mixing.

More formally, following Vats et al. (2019), the ESS for a univariate sampled parameter \(\beta \) is defined as,

$$\begin{aligned} \textrm{ESS} = m \frac{\phi ^2}{\sigma ^2(\beta )} \end{aligned}$$

where m is the number of MCMC iterations after burn-in, \(\phi ^2\) and \(\sigma ^2(\beta )\) are respectively the sample variance and the true variance of \(\beta \), with the latter commonly estimated via the batch method. The multivariate version (mESS) is instead defined as,

$$\begin{aligned} \textrm{mESS} = m \left( \frac{\vert \Phi \vert }{\vert \Sigma \vert }\right) ^{1/k} \end{aligned}$$

where the matrices \(\Phi \) and \(\Sigma \) are the sample and true variance-covariance matrices of a \(k \times 1\) parameter vector \(\beta \); the operator \(\vert \cdot \vert \) identifies the determinant. Again, \(\Sigma \) is consistently estimated via the batch method.

Another popular procedure, used especially in the context of parallel chainsFootnote 6, is the Brook-Gelman convergence statistic (Brooks and Gelman 1998; Gelman and Rubin 1992): consider to have c parallel MCMCs of the same model, each one with \(\overline{m} = m/c\) iterations after a constant burn-in. The sampled between-variance B (between parallel chains) of an univariate parameter \(\beta \) should approach 0 when convergence has taken place, leading to the total variance explained by the sole within-group one (W).

In symbols,

$$\begin{aligned} B = \frac{\overline{m}}{c - 1}\sum _{j=1}^c(\bar{\beta }_{j} - \bar{\beta })^2; \qquad W = \frac{1}{c(\overline{m} - 1)} \sum _{j=1}^c\sum _{l=1}^{\overline{m}} (\beta _{lj} - \bar{\beta }_{j})^2 \end{aligned}$$

with \(\beta _{lj}\) as the l-th draw of \(\beta \) in chain j, \(\bar{\beta }_{j}\) the related sample mean and \(\bar{\beta }\) the total sample mean. The Gelman–Rubin univariate statistic (Gelman and Rubin 1992) computes the ratio of total and within variance:

$$\begin{aligned} R= \frac{V}{W}, \qquad V = \frac{\overline{m}-1}{\overline{m}} W + \frac{c + 1}{c} \frac{B}{\overline{m}} \end{aligned}$$

where \(R < 1.2\) is universally considered as the condition for convergence. The multivariate version is proposed instead by Brooks and Gelman (1998), as

$$\begin{aligned} \textbf{R} = \frac{\overline{m}-1}{\overline{m}} + \frac{c + 1}{c}\gamma \end{aligned}$$

where \(\gamma \) is the maximum eigenvalue of \(\textbf{W}^{-1}\textbf{B}/\overline{m}\), with \(\textbf{W}\) and \(\textbf{B}\) as the multivariate correspondents of W and B. The same rule of thumb \(\textbf{R} < 1.2\) is applied. Notice that both Gelman–Rubin and Brooks–Gelman statistics signal the convergence to the stationary state, but do not properly acknowledge the correlation of the chain; similarly, the ESS indicates the degree of auto-correlation but indirectly ascertains the convergence. Consequently, both statistics should be coupled or at least backed with auxiliary analysis.

Additional indicators included in BayTool are the Geweke stationarity test (Geweke 1992) and the Heidelberger-Welch test (Heidelberger and Welch 1983): both of them appear as robust variants of mean difference statistics computed on particular windows of the sample. The Geweke statistic is obtained as

$$\begin{aligned} G = \frac{\bar{\beta }_{S_0} - \bar{\beta }_{S_1}}{\sqrt{(\tilde{\Omega }_{S_0}/0.1m) + (\tilde{\Omega }_{S_1}/0.5m) }} \sim N(0,1) \end{aligned}$$

where \(\beta \) is the previous univariate sampled parameter; the subscripts \(S_0\) and \(S_1\) denote respectively two different sub-samples from the m draws, with \(S_0\) as the first \(10\%\) and \(S_1\) the last \(50\%\). \(\tilde{\Omega }\) corresponds to the long-run variance of \(\beta \). When stationarity is achieved the test does not reject, signalling homogeneity between draws along the chain.

Heidelberger-Welch test, instead, is given as:

$$\begin{aligned} HW = \frac{\sum _{j=1}^{mt} \beta _{j} - \lfloor mt \rfloor \bar{\beta }}{\sqrt{m\tilde{\Omega }}}, \qquad 0 \le t \le 1 \end{aligned}$$

where \(\lfloor \cdot \rfloor \) designates the floor operator. The test is asymptotically distributed as a Brownian Bridge and the Cramer-von Mises statistic is used to verify the stationarity of the sequence. The scalar t is assumed to be 1, but if a rejection occurs, its value is decreased to 0.9 and the first \(1-t\) proportion of the sample discarded. This procedure is iterated till the test does not reject or \(t=0.5\) and HW still rejects. It is possible to conclude for the non-stationarity only when this last scenario occurs.

4 The package

Function packages in gretl are mainly meant to add to the native software non-trivial statistical functionalities in the form of estimation methods or testing procedures (Cottrell and Lucchetti 2023a, c). The general package structure follows the one of a .zip folder containing the function code as a .gfn fileFootnote 7, a PDF manual, data files and supplementary materials such as additional datasets and sample scripts. Official packages are accessible and downloadable from the online repository via the gretl main window, under the path File->Function packages->On server. Alternatively, in scripting mode, the command pkg install performs the task as shown below:

figure a

Once installed, the package functions can be loaded into the current workspace with the command include (scripting mode),

figure b

On the other hand, if the package is used via the GUI apparatus, the include step can be skipped.

The BayTool package provides: (i) functions primarily devoted to the posterior simulation and inference of commonly used econometric models; (ii) auxiliary functions which address the “post-estimation” analysis, such as plotting marginal posterior distributions or computing MCMC diagnostics. In the next sections, I will review the main features of the functions included: particular emphasis will be also placed on computational aspects such as the use of parallelization for MCMC experiments.

4.1 Main functions

Table 1 reports the list of BayTool main functions for posterior distribution computation: MCMC methods are employed (especially the Gibbs sampler) whenever analytical solutions are not available. To summarize, BT_lm deals with homoskedastic linear models assuming NIG prior specifications; BT_lm_het, instead, deals with heteroskedastic linear models allowing for different prior choices; BT_lasso provides the Bayesian LASSO following the work by Park and Casella (2008). In the context of latent variable models, BT_probit includes the canonical Bayesian probit with data augmentation and the subsequent upgrade due to Held and Holmes (2006); BT_tobit is used for Bayesian analysis of censored data.Footnote 8

Table 1 Main BayTool functions and related details

All functions share a common syntax in terms of the same argument types to establish a notational consistency. I will consider BT_lm as an example, but any of the above functions could have been used. The signature is the following,

figure c

where

  • y, is the series representing the dependent variable;

  • X, is the list of covariates;

  • prior, a bundle containing the prior distribution information;

  • opt, a bundle for additional options. If omitted, the default choices are used.

While y and X are self-explanatory, the role of prior and opt deserves a clarification. The bundle prior governs the prior setup and plays a role similar to the model formula required by BUGS, JAGS, Stan files. However, a fundamental distinction occurs: the prior distributions are model-dependent and fixed to the functional forms shown in Table 1, with the possibility for the user to tune the related parameters via bundle keys. This is to preserve computational efficiency and consistency with standard gretl commands: gretl natively embraces the model-dependent philosophy for frequentist estimation methods with functions or commands such as ols, or probit. On the same line, BayTool proposes the same paradigm to foster the function accessibility for long-standing gretl users (potentially new to the Bayesian world) but also for those who search the simplicity in usage.

Recalling the previous linear model example, suppose to use an independent NIG with \(\beta \sim N_5(0, I), \, \sigma ^2 \sim IG(5,100)\), the related gretl script goes like

figure d

In the first line, an empty bundle named prior is created; in the second the kind of prior specification is chosen via the key type: in the case of linear model two available strings are admitted, "conj" for the conjugate setup, "indep" for the independent prior case. Notice that type admits different values depending on the model specification used: the last column of Table 1 reports the alternatives.

Lines 3-6 show how to define prior parameters: the keys beta_mean and beta_vcv are universally valid for the prior mean and covariance matrix of \(\beta \) following a Normal distribution. Likewise sigma_a and sigma_b refer to the shape \(a_0\) and scale \(b_0\) of the Inverse Gamma for \(\sigma ^2\). The complete list of supported keys is reported in Table 2.

Table 2 Prior parameters admitted in BayTool

The bundle opt contains instead all the additional options,; the associate keys are

  • iter, the number of MCMC iterations after the burn-in (default: 10000);

  • burn, the number of burn-in replications (default: 1000);

  • thin, an integer signalling the thinning interval (default = 0);

  • mpi, the number of cores used for parallelization (default: 1);

  • mpi_seed, the random number generator seed to use when MPI is active (default: none);

  • verbose, a Boolean for printing the output on screen (default: 1).

While the above is common to all main functions, some options are specific such as

  • forecast, a matrix containing the covariate values used for prediction. This key is available only for BT_lm() and BT_lasso().

  • stdize, a Boolean flag which standardizes the covariates in case of 1 or simply center them if 0. The dependent variable is always centered (default = 1). Available for BT_lasso();

  • method, a string used for BT_probit. if method="std" the canonical Gibbs sampler à la Albert and Chib is used, if method = "HH", the Held and Holmes (2006) algorithm is used (default = "std");

  • part_eff, a Boolean for the probit model. If 1 partial effects are automatically computed and reported in terms of partial effects on the mean and average partial effects. (default = 1);

  • rlimit and llimit, scalars for the right and left censoring points in Tobit models (default: llimit = 0, rlimit = NA).

Each function returns a bundle with recurring keys regardless of the model used: most of these simply report basic information such as the dependent variable (dependent), the covariate list (covariates), the prior used (prior_setup); the key sampled_coeff grants access, instead, to a bundle containing the sampled posterior coefficients in matrix form. This can be further used to inspect directly the posterior distribution, especially if coupled with auxiliary functions such as BT_paramplot. Deriving functions of sampled parameters from sampled_coeff is immediate and, in the peculiar case of summary statistics, these can be directly retrieved via the utility bundle key post_summ.

4.2 Auxiliary functions

The package provides several auxiliary functions that integrate the posterior simulation part with utilities and “post-estimation” tools. BT_printres and BT_paramplot, for example, make possible the quick visualization of the main posterior features: BT_printres prints on screen prior and posterior summary statistics.Footnote 9 The only argument required is the returning bundle from one of the previous posterior calculators. Notice that BT_printres is automatically invoked by these functions when opt.verbose = 1. BT_paramplot produces, instead, the marginal posterior distribution plots via kernel density estimates: the main argument is again the output bundle from a posterior sampler. It optionally admits as extra inputs the list of parameters to plot in the form of an array of strings and a bundle for graphical options (for major details see Pedini 2023). If they are omitted, the function plots on screen all the posteriors contained in the output bundle sampled_coeff.

Besides the graphical and summary tools, auxiliary functions mostly pertain to MCMC diagnostics as in Sect. 3.4: five different functions allow the user to explore and monitor the convergence quality of the simulated results. In particular, the included functions are:

  • BT_check_plot;

  • BT_check_mESS;

  • BT_check_G;

  • BT_check_HW;

  • BT_check_BG.

Each one accepts as main input the matrix of sampled coefficients of interest: this can be easily retrieved by means of sampled_coeff bundle, however, the functionality is wider. Users can invoke diagnostics for any user-defined sampled coefficient matrix. In this sense the aim of BayTool is double: on the one hand, providing ad hoc Bayesian econometric models and, on the other hand, supplying procedures for monitoring simulation quality, which are suited for Bayesian methods, but not necessarily limited to these.

Coming to the diagnostic functions, BT_check_plot produces the traceplot, the running mean plot and the ACF plot for the selected parameters. Its signature is BT_check_plot(matrix par_mat, strings names, bundle checkopt), where par_mat is the matrix of coefficients, names, an array of strings identifying a particular set of parameters to plot. The strings should correspond to the row names of par_mat or, alternatively, to the row numeric indices. The bundle checkopt contains some tunable options such as the number of lags for the ACF (lag).

BT_check_mESS allows the user to compute the numerical standard errors, the univariate and multivariate effective sample size using the batch method (Vats et al. 2019). The main input is par_mat, however an optional bundle with key batch_size can be supplied to modify the batch size. Finally, BT_check_G, BT_check_HW and BT_check_BG perform respectively the Geweke mean difference test (Geweke 1992), the Heidelberger and Welch stationarity test (Heidelberger and Welch 1983), the univariate and multivariate Brooks and Gelman convergence statistics (Brooks and Gelman 1998; Gelman and Rubin 1992). Both BT_check_G and BT_check_HW require the matrix par_mat as argument, while BT_check_BG introduces a slight change: the main input should correspond now to an array of matrices. The statistic is in fact often computed on different (possibly parallel) MCMC chains and the user has to explicitly provide their resulting coefficients. If the option mpi is enabled on any previous sampler, the matrix array can be easily obtained from sampled_coeff via the native function msplitby.

4.3 The GUI and the teaching support

When installed, the package offers the possibility to attach a graphical interface to the gretl main window under the path Model->Bayesian models: this represents an outstanding way to access the package functionalities, however, it should be noted that the highest degree of flexibility and customization could only be achieved via scripting.

Figure 1 reports the dialog box: most of the entries recall the main function arguments although some simplifications are in place. Firstly, the GUI explicitly imposes a workflow that starts with the model definition, it continues through the simulation step and concludes with diagnostics checking and marginal distribution plots.

Fig. 1
figure 1

BayTool dialog window

The role of the bundle prior is here hugely reduced: the “Prior form” entry refers to the bundle key prior.type. The user can opt (i) for the “Default” alternative, where the most efficient prior setup is automatically chosen depending on the model specification, or (ii) for autonomously defining the desired configuration. Posterior computation follows the default prior hyperparameter values, unless additional information is provided via the prior bundle. Notice that any bundle has to be initialized and filled with the appropriate quantities before the GUI invocation.

Some options, previously contained in the bundle opt, are now made explicit such as the number of MCMC iterations or the number of processes to use under MPI; non-included features can be tuned via the additional option bundle, which however is optional.

The last dialog box entries refer to the diagnostics and the plotting parts: the user can choose to use a single diagnostic check, all the available batteries (see Sect. 4.2) or none. Similarly, the marginal posteriors for the covariate coefficients can be directly obtained for the whole list or a specific choice.

Before concluding, it should be stressed the true GUI potential: after having loaded any kind of data in gretl, Bayesian analysis can be performed in just a few clicks and, optionally, just a few console inputs. In this way, the user can entirely focus on the inferential process without devoting time and attention to complex scripting. This aspect is extremely valuable when performing preliminary analysis, especially on the sensibility of prior choices. Students may especially benefit from this intuitiveness when approaching the Bayesian framework for the first time: the GUI represents an interactive tool which hugely simplifies the learning process, requiring almost null programming skills or knowledge.

At the same time, BayTool interface may represent the starting point for a more advanced usage: the minimal scripting involved, for example, in prior or option customization is only slightly less complicated than the full application via scripts (see Sect. 5); again, Bayesian novices may find this gradation in the learning approach extremely attractive, since the step required from the GUI usage to proper scripting is effortless.

4.4 Parallelization and MPI

Parallelization in MCMC experiments has become a standard routine supported in most Bayesian statistics software. The standard implementation, known as “embarrassingly parallel”, splits the number of iterations across multiple chains, which are simultaneously executed (one per core), and combines the resulting sampled parameters.Footnote 10 The procedure clearly mimics the one used in independent Monte Carlo experiments but not negligible limitations take place. The sequential nature of MCMC experiments, in fact, poses a serious threat to the validity of parallelization, firstly because the burn-in period cannot be split and has to be kept fixed for each chain; secondly, because all chains have to converge to the stationary state, possibly with the less degree of autocorrelation. If these conditions are met, the computational benefit could be remarkable even though inferior to the one achieved by independent Monte Carlo experiments.

The accessibility to parallel computing, however, is strictly dependent on the software architecture, with some solutions more intuitive and more immediate than others. JAGS or Stan when used in conjunction with R interfaces (consider rjags or rstan, for simplicity) require the only specification of the arguments n.chains or chains,cores to trigger MCMCs parallelization.Footnote 11Stan also grants additional capillarity by allowing the use of ad-hoc functions for summation and “rectangular mapping” (Stan Development Team 2023) inside the .stan setup file. On the other hand, BUGS, despite the possibility of building multiple chains, didn’t originally provide a parallel execution; the MultiBUGS project (Goudie et al. 2020) has handled the task introducing the modelDistribute command. As for BayTool “direct competitors” in econometric analysis, MATLAB does not include Bayesian samplers with tunable function arguments for multiple chains, but parallelization can be integrated manually via the Parallel Computing Toolbox (The Mathworks, Inc 2023). STATA proposes multiple-chain computation with a sequential execution: parallelization is in principle excluded even though multithread alternatives in the form of unofficial functions are available. The R packages for case-specific Bayesian estimation do not natively encompass parallel functionalities but, similarly to MATLAB, the user can manually recover multithread solutions by means of the parallel (The R Core Team 2023) or snow (Tierney et al. 2021) packages.

In gretl, parallelization of MCMCs via “embarrassingly parallel” chains has been previously employed in the package ParMA (Lucchetti and Pedini 2022) which deals with Reversible Jump Markov Chain Monte Carlo (Green 1995; Lamnisos et al. 2009) for model averaging in Generalized Linear Models. In particular, ParMA uses the MPI apparatus natively provided by gretl: even though different modes of parallelization can be configured in gretl (OpenMP, for instance), MPI functionalities are directly integrated in scripting via built-in commands and functions, totally transparent to the user; the only requirement is a suitable MPI library such as MPICH or Open MPI for Unix systems or MS-MPI for Windows ones (Cottrell and Lucchetti 2023b, Section 9 - “Platform specifics”). This implies not only the possibility of exploiting all the benefits deriving from such infrastructure (e.g., parallelization through cores or through networked machines) but also an extreme flexibility of usage.Footnote 12

Similarly to ParMA, BayTool relies on MPI functionalities for “embarrassingly parallel” chains: MPI commands are accessed internally via the function package parallel_func (Schreiber 2023) which is automatically loaded during BayTool declaration. Finally, the user has to simply choose the number of simultaneous chains via the bundle key opt.mpi, available in all BayTool sampler; in case of a GUI user, the MPI box is directly selectable in the dialogue window. Remember however that, to fully take advantage of parallelization, diagnostic checking is even more fundamental here than in single-threaded executions: stationarity and convergence have to be properly acknowledged. The aforementioned Brook and Gelman statistic is a natural choice, but nothing prevents the use of graphical analysis or other tools such as Geweke or Hidelberger-Welch tests for stationarity and ESS for correlation.

5 Applications

In this section, I provide some empirical illustrations covering the specifications reported in Sect. 3. To emphasize the contribution that the package could bring to teaching purposes, I will replicate examples and exercises from well-known Bayesian textbooks or published articles.

Moreover, since a scripting approach is here favored over the GUI, the function invocations are reported alongside all preliminary setups: BayTool requires only a few necessary declarations to work, with the previously shown prior elicitation as the prominent one. The typical workflow can be summarized as follow:

  1. 1.

    load BayTool;

  2. 2.

    open a dataset;

  3. 3.

    define the list of covariates to use;

  4. 4.

    define and build the prior bundle: the user, in general, has to specify the type to use and, possibly, the prior hyperparameters;

  5. 5.

    (optional) define and build the opt bundle for MCMC options;

  6. 6.

    invoke the main function;

  7. 7.

    (optional) post-estimation analysis.

Only basic gretl coding is involved and no particular model formalizations are required other than the prior form and the choice for the posterior sampler function.Footnote 13 In this sense, the procedure is highly “standardized” regardless of the main functions used.

As for the presented applications, Sect. 5.1 deals with linear models and provides a preliminary benchmark for testing the package quality. Section 5.2 introduces a Bayesian LASSO example by comparing the function BT_lasso with a competing software; while Sect. 5.3 considers binary choice models shedding light on the computational scheme: the traditional Albert and Chib algorithm is contrasted with Held and Holmes (2006) method. MCMC diagnostic measures are introduced to highlight the convergence quality of the chains. Finally, Sect. 5.4 inspects the impact of parallelization in MCMCs.

5.1 Linear model example

Consider the Bayesian linear model example with independent priors proposed by Koop (2003), Chapter 4.2.7, on house prices. The dataset comes from Anglin and Gencay (1996) and contains 576 observations on house sales in Windsor, Canada, in 1987: the dependent variable y represents the sale price of a given house, while the covariates are the lot size in square feet (lot_size), the number of bedrooms (n_bedroom), the number of bathrooms (n_bathroom) and the number of storeys (n_storeys).

The prior for \((\beta , \sigma ^2)\) follows an independent NIG, with

$$\begin{aligned}{} & {} \beta \sim N_5(\beta _0, V_0),\\{} & {} \beta _0 = \begin{bmatrix} 0\\ 10\\ 5000\\ 10000\\ 10000\\ \end{bmatrix} \quad V_0 = \begin{bmatrix} 10000^2 &{} 0 &{}0 &{}0 &{}0 \\ 0 &{} 5^2 &{} 0 &{}0 &{}0 \\ 0 &{} 0&{} 2500^2 &{} 0 &{}0 \\ 0 &{} 0 &{}0 &{}5000^2 &{}0 \\ 0 &{} 0 &{}0 &{}0 &{}5000^2\\ \end{bmatrix} \end{aligned}$$

where the first coefficient refers to the intercept while the others refer to the previously presented covariates (in order of appearance). Instead of using directly \(\sigma ^2\), Koop (2003) reparametrizes in terms of precision \(1/\sigma ^2\), which follows a Gamma distribution with mean \(4 \times 10^{-8}\) and 5 degrees of freedom. The interpretation of prior hyperparameters is the following: consider \(\beta _{{\texttt {n\_bedroom}}}\), the expected effect of an additional bedroom in the house price (given the other variables constant) is an increase of 5000 dollars, i.e., \(\beta _{0,{\texttt {n\_bedroom}}}\). Attaching a prior standard deviation of 2500 implies a marginal effect inside the region [100, 9900] with \(95\%\) probability.Footnote 14 Such a large variance reflects a rather crude approximation of the regression coefficient impact, hinting at a cautious evaluation of the real estate market. As for \(\sigma ^2\), residuals from a well-fitting model will likely range in \([-10000, 10000]\). Assuming such an interval as the \(95\%\) probability region of the Normal error distribution leads to \(\sigma \approx 5000\) and consequently \(1/\sigma ^2 = 4 \times 10^{-8}\). The degrees of freedom are set to a small value in order to have relatively loose information on \(\sigma ^2\).

The Gibbs sampler is run for 10000 iterations with an initial burn-in of 1000. The gretl script is reported below

figure e

where in the first line the package functions are included in the workspace (step 1), while in the second the seed for pseudo-random number generation is set (optional). In the third line, the dataset is retrieved (step 2), and in the subsequent declarations the dependent and the independent variables are collected (step 3). Prior specification (step 4) is handled via the bundle prior in the following way

figure f

while additional options (step 5) are passed through the bundle opt

figure g

where opt.forecast represents a vector containing the covariate values used to build a single prediction. Notice that the prior parameters for \(\sigma ^2\) can be elicited directly in terms of shape and scale of the Inverse Gamma or, alternatively, in terms of precision mean and degrees of freedom as above.

The function invocation (step 6) is

figure h

which produces the following outputFootnote 15

figure i

The first two columns report the prior mean and standard deviations of the coefficients, the third and fourth are the non-informative ones (frequentist estimates), while the last columns represent the informative posterior statistics obtained used the bundle prior information.

Table 3 Posterior statistics comparison (mean and standard deviation): Koop (2003) textbook results versus BayTool

For comparison, Table 3 collects the main posterior summary statistics from the original estimates in Koop (2003)Footnote 16 where, for simplicity, those previously obtained with BayTool are added: as can be noted, the results are almost identical.

To further illustrate the package functionality, the marginal posterior distributions can be produced (step 7) using the dedicated function BT_paramplot,

figure j

which returns the graphs reported in Fig. 2.

Fig. 2
figure 2

Marginal posteriors for main covariate coefficients

At this point, it is also possible to access additional information such as posterior predictions and to produce the related histogram in a few scripting commands,

figure k

with the corresponding result plotted in Fig. 3.

Fig. 3
figure 3

Posterior predictive distribution

5.2 Bayesian LASSO example

In the following, a replica of the diabetes example as in Park and Casella (2008) is proposed: BayTool is here compared to the R function blasso from the package monomvn.

The dataset is due to Efron et al. (2004) and collects information on diabete disease progression (y) in 443 patients. The regressor list contains: age (age), sex (sex), body mass index (bmi), average blood pressure (map) and six blood serum measurements (tc, ldl, hdl, tch, ltg, glu). The posterior derivation is based on 10000 iterations after a \(10\%\) burn-in and a thinning window of 10, with a Gamma hyperprior on \(\lambda ^2\) with shape and rate respectively equal to 1 and 1.78.

The gretl script is the following:

figure l

Table 4 reports the main quantiles of the parameters coefficients computed via the R function blasso and BT_lasso: the results are extremely similar, therefore the example is successfully replicated.Footnote 17

Table 4 Posterior statistics comparison (main quantiles): R function blasso from monomvn versus BayTool - BT_lasso

Notice that not all quantiles are, by default, supplied as summary statistics in the returning bundle of BT_lasso (out.post_summ), however obtaining these quantities is trivial from the sampled \(\beta \) in out.sampled_coeff.sampled_beta, as suggested in the following lines

figure m

5.3 Bayesian Probit example

Binary choice models are common in microeconometrics and this experiment shows how a Bayesian estimation can be performed using two different computational schemes. The example comes from Chan et al. (2019), Chapter 14, Exercise 14.3, and re-examines the application by Fair (1978) on extramarital affairs: the dataset contains 601 responses from the Psychology Today Survey where the dependent variable is the decision to have an extramarital affair (y) and the covariates are a male dummy (male), the number of years married (ys_married), a dummy for children (kids), a dummy for classifying whether the interviewed is religious, years of education (ed) and a dummy for denoting a happy marriage (happy).

In the first part of the script the common data augmentation scheme à la Albert and Chib for Bayesian probit is performed using a Gibbs sampler with 2000 iterations, after 500 replicas of burn-in. The prior for covariate coefficients is specified as

$$\begin{aligned} \beta \sim N_7(0, 100 \cdot I_7) \end{aligned}$$

where an intercept term is added to the variable list.

To further inspect the quality of the result (which is notoriously affected by high autocorrelation) some diagnostics are included such as the ESS (both univariate and multivariate) and a graphical analysis of some selected coefficients. Remember that these are available in the matrix of sampled coefficients (out1.sampled_coeff.sampled_beta). The script is here reported

figure n

Table 5 summarizes the posterior means and standard deviations of the sampled coefficients produced by the function BT_probit (columns 3–4) alongside the estimates reported in Chan et al. (2019) (columns 1–2).Footnote 18

Table 5 Posterior statistics comparison (mean and standard deviation): Chan et al. (2019) textbook results versus BayTool

Both posterior means and posterior standard deviations are replicated.Footnote 19 However, as highlighted in Table 6, a high degree of correlation can be spotted in the coefficients: the ESS is considerably smaller than the total amount of iterations. Further confirmation of this fact can be acknowledged via Fig. 4 where the traceplots and the autocorrelation functions for the parameters of ys_married and happy are shown.

Table 6 Effective Sample Size from BT_probit, standard algorithm
Fig. 4
figure 4

Traceplot and ACF function for the sampled parameters of ys_married and happy - standard Gibbs sampler (à la Albert and Chib)

To counter this fact, in the second part of the script the Held and Holmes (2006) algorithm is implemented. The gretl script goes as follow:

figure o

As can be noted in Table 5 from columns 5-6, the posterior quantities are also replicated in this context; however, Table 7 and Fig. 5 reveal a remarkably different quality of the sample: the autocorrelation is now hugely decreased with benefits in the mixing of the chain.

Table 7 Effective Sample Size from BT_probit, Held and Holmes (2006) algorithm
Fig. 5
figure 5

Traceplot and ACF function for the sampled parameters of ys_married and happy - Held and Holmes (2006) algorithm

5.4 Parallelization in action

In order to determine the computational advantage due to parallelization, I will replicate the previous examples on the linear model, LASSO and the probit modelFootnote 20 with 100000 iterations after a \(10\%\) burn-in, using up to 20 parallel chains (opt.mpi). Following the results reported in Cottrell and Lucchetti (2023b) and in Lucchetti and Pedini (2022), hyper-threading, that is the use of logical cores, has been avoided due to the proven CPU time deterioration.

Table 8 CPU time performance (in seconds) of the three considered BayTool functions across different numbers of cores (MPI) employed

Table 8 collects the resultsFootnote 21: in all scenarios considered parallelization leads to massive CPU time reductions. Simply switching from a single-threaded MCMC to two parallel chains offers a 50–60\(\%\) time saving which reaches its peak of 85–90\(\%\) in the case of 7 cores for the linear model, 16 for the LASSO and 20 for the probit. Surprisingly, exploiting all available cores seems to induce the most valuable performance only in the probit example. Before attempting any explanation, it should be noticed that the samplers show merely identical results starting from given thresholds: 5–6 chains in the linear model, 10 in the LASSO case and 16 for the probit. From here on, the variation is mostly ascribable to computational noise.

As Fig. 6 clarifies, the CPU performance follows a hyperbolic function mostly declining in the first part of the x-axis, but then stabilizing to a steady path in the second one. The linear case, for example, can reach meaningful results with the smallest amount of parallel chains, showing an almost constant trend after 6–7 simultaneous chains. Once reached this point, the cost of invoking additional cores is not totally offset by the potential speed-up of having an extra chain. Similar conclusions apply to the other two functions. A plausible determinant of this harsh reduction of marginal benefits has to be searched in the extremely quick convergence of the MCMCs, which require few iterations to reach valuable results.

Fig. 6
figure 6

CPU elapsed time (seconds) in the three scenarios considered

In conclusion, the impact of parallelization is far from being negligible, with an impressive time reduction in all scenarios considered. What is less intuitive, is finding the ideal number of parallel chains to use. The solution is not straightforward, because i) exploiting all available power is not always the first-best choice, as previously shown; moreover ii) the performance is inherently case-to-case dependent. Practical guidance is often found in running some preliminary smaller MCMCs to detect the CPU scalability of the simulation.

6 Conclusion

The BayTool package collects Bayesian estimation methods of commonly used econometric models. Computational efficiency is reached by combining a case-specific sampling approach, which ensures Gibbs sampling applicability, with the parallelization of MCMCs via MPI. As shown via the replication exercises, the package can reach remarkable results both in terms of accuracy and execution time. Moreover, the simplicity in the usage, fostered by a graphical interface, makes BayTool an extremely user-friendly solution for Bayesian econometrics, which, on the one hand, encourages the consolidated user base to experiment with the now-integrated Bayesian routines, previously absent, and on the other hand, looks at a new audience which may find the package profitable for learning the basics of Bayesian methods. Last but not least, the package is intended as a long-term project with recurrent updates that will enrich its functionalities: following the case-specific philosophy, BayTool will provide various specialized routines for micro- and macroeconometric models; at the same time, this clearly will not exclude the possibility of a more general applicability with more prior choices per function. The current state of development includes the forthcoming release of ordinal and multinomial probit, logit and count models.