How to generalize from a hierarchical model?

Pachali, Max J.; Kurz, Peter; Otter, Thomas

doi:10.1007/s11129-020-09226-7

How to generalize from a hierarchical model?

Open access
Published: 17 May 2020

Volume 18, pages 343–380, (2020)
Cite this article

Download PDF

You have full access to this open access article

Quantitative Marketing and Economics Aims and scope Submit manuscript

How to generalize from a hierarchical model?

Download PDF

Max J. Pachali¹,
Peter Kurz² &
Thomas Otter³

4775 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Models of consumer heterogeneity play a pivotal role in marketing and economics, specifically in random coefficient or mixed logit models for aggregate or individual data and in hierarchical Bayesian models of heterogeneity. In applications, the inferential target often pertains to a population beyond the sample of consumers providing the data. For example, optimal prices inferred from the model are expected to be optimal in the population and not just optimal in the observed, finite sample. The population model, random coefficients distribution, or heterogeneity distribution is the natural and correct basis for generalizations from the observed sample to the market. However, in many if not most applications standard heterogeneity models such as the multivariate normal, or its finite mixture generalization lack economic rationality because they support regions of the parameter space that contradict basic economic arguments. For example, such population distributions support positive price coefficients or preferences against fuel-efficiency in cars. Likely as a consequence, it is common practice in applied research to rely on the collection of individual level mean estimates of consumers as a representation of population preferences that often substantially reduce the support for parameters in violation of economic expectations. To overcome the choice between relying on a mis-specified heterogeneity distribution and the collection of individual level means that fail to measure heterogeneity consistently, we develop an approach that facilitates the formulation of more economically faithful heterogeneity distributions based on prior constraints. In the common situation where the heterogeneity distribution comprises both constrained and unconstrained coefficients (e.g., brand and price coefficients), the choice of subjective prior parameters is an unresolved challenge. As a solution to this problem, we propose a marginal-conditional decomposition that avoids the conflict between wanting to be more informative about constrained parameters and only weakly informative about unconstrained parameters. We show how to efficiently sample from the implied posterior and illustrate the merits of our prior as well as the drawbacks of relying on means of individual level preferences for decision-making in two illustrative case studies.

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

1 Introduction

Models of consumer heterogeneity play a pivotal role in marketing and economics. Typical applications are random coefficients or mixed logit models for aggregate or panel data (e.g., Revelt and Train 1998 and Train 2009), and hierarchical Bayesian models. Influential applications of these models involve inference from household scanner panel data or from discrete choice experiments (e.g., Allenby and Lenk 1994, Rossi et al. 1996, Allenby et al. 1998, Dubé et al. 2010, and Sawtooth 2013). In most applications, the inferential target pertains to a population beyond the sample of consumers providing the data for model calibration. For example, pricing, product design, or product line decisions informed by the sample data through the model are expected to be optimal in the population and not just in the observed, finite sample. The population model, the heterogeneity or random coefficients distribution is the natural and correct basis for generalizations from the observed sample of consumers or respondents to the market. The fact that inferences about parameters of this distribution are consistent in the sample size (N), even if the number of observations contributed by each consumer (T) is very small, makes this approach attractive from a statistical perspective.

Unfortunately, standard population distributions often lack economic rationality. For example, Reiss and Wolak (2007) remark that the estimated distribution of marginal utility of fuel economy in Berry et al. (1995) suggests that about half of consumers in the car market dislike fuel economy. As another example, Dubé et al. (2008, 2010) find support for positive price coefficients in the inferred heterogeneity distribution. Such economically unreasonable characterizations of consumer heterogeneity prevent meaningful counterfactual predictions from the model. As an obvious example, models that support positive price coefficients in the inferred heterogeneity distribution preclude model based price optimization.

While a completely theory driven specification of heterogeneity distributions appears to be beyond reach, some authors argue in favor of theory driven constraints in the population distribution (e.g., Boatwright et al. 1999 and Allenby et al. 2014). The goal is a heterogeneity model that is maximally flexible regarding some aspects of the population distribution, but deterministically constrained by economic theory regarding other aspects of this distribution. This paper builds on this idea and develops it further.

In applications, a prior understanding of preferences in the population often suggest a large number of sign and order restrictions, for example: that the price parameter in an indirect utility function is negative or that consumers prefer a more fuel efficient to a less fuel efficient car, everything else equal. So called constrained parameter problems are relevant across academic fields and a body of literature dealt with this topic. Gelfand et al. (1992) provide an overview of how to impose sign and order constraints based on truncated distributions using Gibbs sampling. Allenby et al. (1995) introduce this approach into marketing in the context of individual level conjoint analysis. Boatwright et al. (1999) develop a sampler in the spirit of Gelfand et al. (1992), but for a hierarchical sales response regression model.

However, sign and order restrictions in models of heterogeneity still present unresolved challenges. In principle, one could adopt truncated normal distributions that implement prior constraints as outlined in Gelfand et al. (1992) for heterogeneity distributions. However, as we show below, any truncated distribution of heterogeneity leads to a so called “doubly intractable” inference problem. The log-normal prior avoids this difficulty. The basic idea of using log-normal distributions to implement sign and order constraints is not new. For example, Allenby et al. (2014) use the exponential transformation, $\beta _{p} = -\exp (\beta _{p}^{\ast })$ with $\beta _{p}^{\ast } \in \mathbb {R}$ distributed according to a hierarchical normal mixture prior, to enforce that the model has zero support for positive price coefficients. In this specification, the problem is that $\beta _{p}^{\ast }$ is measured on the log scale and standard diffuse subjective prior settings imply absurdly large and small values of transformed coefficients β_p (e.g., Allenby et al. 2014).^{Footnote 1} In the common situation where the heterogeneity distribution thus comprises both constrained and unconstrained coefficients, the choice of subjective prior parameters is an unresolved challenge.

As a solution to this problem we propose a marginal-conditional decomposition that avoids the conflict between wanting to be more subjectively informative about constrained parameters and only weakly informative about unconstrained parameters. We show that this decomposition is important whenever the heterogeneity distribution comprises a mix of constrained and unconstrained coefficients, e.g., brand and price coefficients. Our decomposition applies both to the fully parametric multivariate normal setting as well as to its semi-parametric generalizations. In addition, we show how to efficiently sample from the implied posterior building on the likelihood based pre-tuning of proposal densities in Rossi et al. (2005).

Finally, we contrast profit implications of relying on the inferred population distribution to an ad-hoc approach that approximates heterogeneity using means of individual level coefficients. This latter approach is still common in applied academic and industry research. It is ad-hoc because if fails to measure heterogeneity consistently, distorting inference towards the population mean. As a consequence, markets will misleadingly appear too homogeneous, translating into too little product differentiation and too much price competition in counterfactual calculations. A side-effect of this distortion is a reduction of sign and order violations in the approximated heterogeneity distribution that likely contributed to the popularity of this ad-hoc approach.

In a nutshell the goal of this paper is to facilitate the formulation of more economically faithful hierarchical prior distributions of heterogeneity for better market simulators and improved counterfactual calculations. We thereby hope to broaden the applicability of models of heterogeneity, and to convince applied academic and industry researchers to abandon market simulators built on means of individual level preferences. The remainder of the paper proceeds as follows: Section 2 formally introduces different ways of generalizations from hierarchical Bayesian models and discusses implications for market simulation. In Section 3 we develop the hierarchical prior formulation and in Section 4 we discuss efficient sampling of individual level coefficients. Section 5 then investigates the relative performance of the proposed approach using simulated data. Sections 6 and 7 report the results from two empirical illustrations based on household scanner panel data on purchases of fresh hen’s eggs (Kotschedoff and Pachali 2020) and data from a discrete-choice experiment on tablet PCs. Finally, we summarize and discuss results in Section 8.

2 Different ways of generalizations and market simulations

Different ways of generalizing from hierarchical models to consumer preferences, choices, and market shares in the target population are best illustrated in a decision theoretic framework. For this purpose, and without loss of generality, we abstract away from competition and fixed costs, and assume constant marginal prices and costs in the following. If the decision-maker knew the distribution of preferences in the population denoted as p(β|τ), he would choose the action a ∈ A that maximizes profits $ {\int \limits } \pi (a,\beta )\ p(\beta |\tau )\ d\beta = \mathbb {E}_{\beta \vert \tau }\left [\pi (a,\beta )\right ] = {\pi }(a)$ by solving the following maximization problem:

$$ \max_{a\in A} \left\{{\pi}(a) \propto \left( P(a)-C(a)\right) \int \text{MS}(a,\beta) p(\beta|\tau)\ d\beta \right\} $$

(1)

Here MS(a,β) is the market share from action a and preference β, as implied by a choice model, C(a) denotes marginal costs associated with action a, and P(a) the marginal price, which may itself constitute an action; thus (P(a) − C(a)) is the contribution margin. Finally, the proportionality results from ignoring the market size.

Because the preference distribution in the population is generally unknown, the decision-maker forms an expectation about profits based on data $Y=\left (\begin {array}{llllll} y_{1} & {\dots } & y_{i} & {\dots } & y_{N} \end {array}\right )$, where y_i is the T_i-vector of observations from individual i in the sample, and based on prior assumptions about the choice model underlying MS(a,β), the distribution of preferences in the population p(β|τ), and the parameters τ in this distribution. He then maximizes the posterior expected profit:

$$ \hat{\pi}(a) = \mathbb{E}_{\beta|Y}\left[\pi(a,\beta)\right] \propto \left( P(a)-C(a)\right) \int \text{MS}(a,\beta) p(\beta|\tau) p(\tau|Y)\ d\left( \beta,\tau \right) $$

(2)

This estimator of expected profits entirely relies on posterior knowledge of the hierarchical prior distribution. We thus refer to this approach as “generalizing based on the hierarchical prior”. It is easily computed to an arbitrary degree of precision based on MCMC draws from the posterior distribution p(τ|Y ) coupled with draws from the hierarchical prior distribution p(β|τ). However, because it entirely relies on the posterior of the hierarchical prior, all prior parametric assumptions will come to bear. If, for example, the hierarchical prior supports positive and negative price coefficients as in a normal distribution, the posterior of the hierarchical prior will necessarily—and may substantially—support positive price coefficients. The problem may persist even if the data reliably locate all individual specific posterior price coefficient distributions in the negative domain. The reason is that the best normal approximation matches the first and second moment of the distribution to be fitted, which may result in substantial support for positive coefficients even if all coefficients to be fitted are negative.

To mitigate the extrapolation of parametric assumptions in directions that violate economic theory, market simulators often rely on the collection of individual level posterior mean estimates $\{\hat {\beta }_{i}\}_{i=1}^{N}$ where $\hat {\beta }_{i}={\int \limits } \beta _{i} p(\beta _{i}|Y,y_{i})d\beta _{i}$ —the shrinkage of individual level posterior means to the population mean in general reduces the number of sign and order violations, albeit at the expense of severely inconsistent inferences about heterogeneity. Expected profits from action a are then estimated as:

$$ \hat{\pi}(a) \propto \left( P(a)-C(a)\right) \frac{1}{N}\sum\limits_{i=1}^{N} \text{MS}(a,\hat{\beta}_{i}) $$

(3)

However, as we illustrated in Appendix A.1, this estimator that aggregates optimal, in the sense of a bias-variance trade-off, individual level estimates, itself fails optimality criteria and is inconsistent no matter how large the sample of consumers N, as long as individual level likelihoods are not perfectly informative about individual level preferences. In practice, individual level likelihoods tend to be diffuse, which motivates hierarchical models in the first place.

A third estimator of expected profits from action a builds on the collection of individual level posterior distributions. We refer to this form of generalization as lower level model non smoothed (n.s.) because it relies on the lower, individual level models, but does not summarize individual level posteriors to estimates.

$$ \hat{\pi}(a) \propto \left( P(a)-C(a)\right) \frac{1}{N}\sum\limits_{i=1}^{N} \int \text{MS}(a,\beta_{h}) p(\beta_{h}|y_{i},\tau) p(\tau|Y)\ d\left( \beta_{h},\tau \right) $$

(4)

The difference between this estimator and that defined in Eq. 2 is that y_i is used both to inform the posterior p(τ|Y ) and the prediction to new consumers’ preferences in p(β_h|y_i,τ). When individual level posterior distributions essentially degenerate to a point because of highly informative individual level likelihoods, the estimator in Eq. 4 converges to that defined in Eq. 3. When individual level posterior distributions come from diffuse individual level likelihoods, as usual, the estimator in Eq. 4 will be very similar to that in Eq. 2. Thus, parametric assumptions in the hierarchical prior distributions will be similarly influential. Consistent with these assessments, we only find negligible differences between generalizations based on the posterior of the hierarchical prior and lower level model n.s. in the empirical applications discussed below.

What way of generalization should we use for market simulation in practice? Every trained Bayesian analyst will point out the inconsistency associated with relying on the collection of individual level posterior means. Such an analyst knows that posterior predictive preference distributions as defined in Eqs. 2 and 4 allow for consistent inference (in N), however conditional on functional form assumptions.

However, because standard parametric and semi-parametric assumptions such as multivariate normal or its finite mixture generalization violate basic economic intuition in many applications, consistency conditional on these assumptions is not too helpful. Thus, many applied researchers and practitioners opt for generalizations, i.e., market simulation based on the collection of individual level posterior means (Eq. 3) that often substantially reduce the share of sign and order violations. We aim to overcome the choice between relying on the posterior of a mis-specified hierarchical prior and the collection of individual level posterior means that fail to measure heterogeneity, by showing how to specify more economically faithful hierarchical prior distributions based on prior constraints. The goal is a hierarchical prior that both is maximally flexible regarding some aspects of the population distribution of preferences, and deterministically constrained by theory regarding other aspects of this distribution.

3 Sign and order constraints

Sign and order constraints dogmatically express prior knowledge about the support of a distribution, e.g., that the price parameter in an indirect utility function is negative or that a consumer prefers a more fuel efficient to a less fuel efficient car for sure, everything else equal. So called constrained parameter problems are relevant across academic fields and a body of literature dealt with this topic. Gelfand et al. (1992) provide an overview of how to impose sign and order constraints based on truncated distributions using Gibbs sampling. Allenby et al. (1995) introduce this approach into marketing in the context of individual level conjoint analysis. Boatwright et al. (1999) develop a sampler in the spirit of Gelfand et al. (1992), but for a hierarchical sales response regression model.

However, the implementation of sign and order restrictions in hierarchical Bayesian models is still without a generally accepted solution. In principle, one could adjust the sampler outlined by Gelfand et al. (1992) to hierarchical settings. However, as we show next, any truncation applied to the prior (and hence to the posterior) of individual level coefficients in a hierarchical setting leads to a so called “doubly intractable” inference problem in the hierarchical prior. Doubly intractable problems are characterized by a normalization constant that depends on target parameters (e.g., Möller et al. 2006 and Murray et al. 2006). Consider the following truncated normal hierarchical prior for consumers’ demand parameters:

$$ p(\beta | \bar{\beta},V_{\beta}) = \frac{\varphi(\beta |\bar{\beta},V_{\beta})}{\mathbb{Z}(\bar{\beta},V_{\beta})} \mathbf{1}(\beta \in \mathbb{R}^{k}_{c}), $$

(5)

where $\mathbb {R}^{k}_{c}$ denotes the truncation region of a k-dimensional demand parameter vector β, φ denotes the multivariate normal density and $\mathbb {Z}(\bar {\beta },V_{\beta })$ the corresponding normalizing constant:

$$ \mathbb{Z}(\bar{\beta},V_{\beta}) = {\int}_{\mathbb{R}^{k}_{c}} \varphi(\beta |\bar{\beta},V_{\beta}) d \beta $$

(6)

The conditional posterior distribution of parameters indexing the hierarchical prior then becomes:

$$ p(\bar{\beta},V_{\beta} | \{\beta_{i}\} ) \propto \prod\limits_{i=1}^{N}\frac{\varphi(\beta_{i} |\bar{\beta},V_{\beta})}{\mathbb{Z}(\bar{\beta},V_{\beta})} \mathbf{1}(\beta_{i} \in \mathbb{R}^{k}_{c}) p(\bar{\beta},V_{\beta}), $$

(7)

where $p(\bar {\beta },V_{\beta })$ denotes the subjective prior for hierarchical prior parameters. Equation 7 is an example of a doubly intractable inference problem because even after dropping the normalization constant $\int \limits \left ({\prod }_{i=1}^{N}\frac {\varphi (\beta _{i} |\bar {\beta },V_{\beta })}{\mathbb {Z}(\bar {\beta },V_{\beta })} \mathbf {1}(\beta _{i} \in \mathbb {R}^{k}_{c}) p(\bar {\beta },V_{\beta }) \right )d\{\bar {\beta },V_{\beta }\}$ of the posterior giving rise to the proportionality, we are left with the intractable expression ${\mathbb {Z}(\bar {\beta },V_{\beta })}$. This expression normalizes the multivariate normal density to the region of support defined by $\mathbb {R}^{k}_{c}$ and cannot be dropped because it depends on target parameters $\bar {\beta }$ and V_β.^{Footnote 2}

As a consequence of truncation, we loose the convenience of conditionally conjugate updates of hierarchical prior parameters $\bar {\beta }$ and V_β regardless of what subjective prior distributions we employ. More generally, all estimation and sampling techniques that require the evaluation of the conditional “likelihood” $p(\{\beta _{i}\} | \bar {\beta },V_{\beta }) ={\prod }_{i=1}^{N}\frac {\varphi (\beta _{i} |\bar {\beta },V_{\beta })}{\mathbb {Z}(\bar {\beta },V_{\beta })}$, including standard Metropolis-Hastings sampling, are hamstrung by the intractability of ${\mathbb {Z}(\bar {\beta },V_{\beta })}$.^{Footnote 3}Boatwright et al. (1999) propose to numerically approximate ${\mathbb {Z}(\bar {\beta },V_{\beta })}$ at each MCMC iteration using the GHK algorithm (Hajivassiliou et al. 1996). While this seems reasonable in their application that involves sign constraints on at most four parameters in a model with five parameters in total, numerical approximations will be problematic in the high-dimensional parameter spaces, potentially involving a multiplicity of constraints that have become common in applications more recently.

The log-normal hierarchical prior avoids this difficulty. The basic idea of using log-normal distributions to implement sign and order constraints is not new. For example, Allenby et al. (2014) use the exponential transformation, $\beta _{p} = -\exp (\beta _{p}^{\ast })$ with $\beta _{p}^{\ast } \in \mathbb {R}$ and distributed according to a hierarchical normal mixture prior, to enforce that the model has zero support for positive price coefficients. In this specification, the problem is that $\beta _{p}^{\ast }$ is measured on the log scale and standard diffuse subjective prior settings imply absurdly large and small values of transformed coefficients β_p (e.g., Allenby et al. 2014).

Thus, the problem is how to specify differentially informative subjective priors for constrained coefficients and unconstrained coefficients. The standard Normal-Inverse-Wishart (NIW) subjective prior for means and covariance matrices in the hierarchical prior distribution is limited in this regard—mostly because the prior concentration of the IW-prior is controlled by a single parameter (the prior degrees of freedom also known as the prior shape).

Next, we present a solution to this problem that re-parameterizes the hierarchical prior. Our contributions in this context are, first, a marginal-conditional decomposition of the hierarchical prior distribution that enables the analyst to be differentially informative about the distribution of constrained and unconstrained parameters in the population a priori^{Footnote 4}, and second, the generalization of the pre-tuning of proposal densities in Rossi et al. (2005) to this hierarchical prior.

The proposed marginal-conditional decomposition becomes essential whenever the hierarchical prior comprises both constrained and unconstrained parameters such as e.g., in simple hierarchical choice models that feature brand coefficients and a price coefficient. The proposed generalization of pre-tuned proposal densities (Rossi et al. 2005) is particularly important in high dimensional models that feature a multiplicity of constraints.

3.1 Marginal-conditional decomposition

Our hierarchical prior starts with a standard normal distribution.^{Footnote 5} Unconstrained coefficients have a normal hierarchical prior while sign and order constraints are imposed through exponential transformations of normal variates resulting in log-normally distributed coefficients. Vice versa, we can log-transform from sign and order constrained parameters that enter the likelihood to unconstrained, a priori conditionally normally distributed variates. We formulate subjective priors over this unconstrained space but use a marginal-conditional decomposition to implement vastly different subjective priors for parameters that are exponentiated and those that are not.

We denote $g:\mathbb {R}^{k} \rightarrow \mathbb {R}_{c}^{k}$ as the function that maps normally distributed variates $\beta _{i}^{\ast }$ to sign and order constrained coefficients β_i that enter multinomial likelihoods explaining individual choice data y_i. We distinguish k_c “constrained” coefficients $\beta _{i}^{\ast c}$, i.e., coefficients to be transformed to obey sign and order constraints, and k_uc unconstrained coefficients $\beta _{i}^{\ast uc}$ in the hierarchical prior.

$$ \begin{array}{@{}rcl@{}} y_{i} | g(\beta_{i}^{\ast}) &\sim& MNL\left( y_{i} | g(\beta_{i}^{\ast})\right)\\ \beta_{i}^{\ast} &\sim& N\left( \bar{\beta}^{\ast},V_{\beta^{\ast}}\right),\ \text{or}\\ \left( \begin{array}{llllll} \beta_{i}^{\ast c} \\ \beta_{i}^{\ast uc} \end{array}\right) &\sim& N\left( \left( \begin{array}{llllll} \mu_{c}^{\ast} \\ \mu_{uc}^{\ast} \end{array}\right), \left( \begin{array}{llllll} V_{\beta_{11}^{\ast}} & V_{\beta_{12}^{\ast}} \\ V_{\beta_{21}^{\ast}} & V_{\beta_{22}^{\ast}} \end{array}\right) \right) \end{array} $$

(8)

With the goal of formulating rather different subjective priors for the parameters governing the distribution of $\beta _{i}^{\ast c}$ and $\beta _{i}^{\ast uc}$, we re-express the multivariate normal distribution in Eq. 8 in the form of a multivariate regression model that regresses unconstrained coefficients $\beta _{i}^{\ast uc}$ on “constrained” coefficients $\beta _{i}^{\ast c}$:

$$ B^{\ast uc} = \left( \begin{array}{llllll} \iota & B^{\ast c} \end{array}\right) \left( \begin{array}{llllll} z^{\prime} \\ {\Gamma} \end{array}\right) + U \qquad vec(U^{\prime}) \sim N(0,I_{N} \otimes {\Sigma}) $$

(9)

Here, B^∗uc and B^∗c are matrices with k_uc and k_c columns, respectively, and N rows each, collecting unconstrained and “constrained” coefficients from individuals in the sample, and ι is a (N × 1)-vector of 1’s; Γ is a (k_c × k_uc) matrix of regression coefficients, z a column vector of intercept coefficients of length k_uc, and Σ is the (k_uc × k_uc) conditional variance-covariance of unconstrained coefficients in the population.

The first two moments of the distribution of “constrained” coefficients are obtained from yet another multivariate regression model that regresses “constrained” coefficients on a vector of constants:

$$ B^{\ast c} = \iota (\mu_{c}^{\ast})^{\prime} + U_{V^{\ast}} \qquad vec(U_{V^{\ast}}^{\prime}) \sim N(0,I_{N} \otimes V^{\ast}) $$

(10)

Here, ι is again a (N × 1)-vector of 1’s and V^∗ is the marginal variance-covariance matrix of constrained coefficients. The multivariate regression models in Eqs. 9 and 10 imply the following re-parameterization of the joint distribution of $\beta ^{\ast }_{i}$ from Eq.8:

$$ \beta^{\ast}_{i} \sim N\left( \left( \begin{array}{llllll} \mu_{c}^{\ast} \\ {\Gamma}^{\prime} \mu_{c}^{\ast} + z \end{array}\right), \left( \begin{array}{llllll} V^{\ast} & V^{\ast} {\Gamma} \\ {\Gamma}^{\prime} (V^{\ast})^{\prime} & {\Gamma}^{\prime} V^{\ast} {\Gamma} + {\Sigma} \end{array}\right) \right) $$

(11)

The advantage of the re-parameterization in Eq. 11 relative to the more standard parameterization in Eq. 8 is that we can now specify arbitrarily informative subjective priors for the hierarchical prior distribution of “constrained” coefficients, i.e., for the parameters $\mu _{c}^{\ast }$ and V^∗ without restricting the prior of unconstrained coefficients. That is, if we a priori set V^∗ to a “small” covariance matrix, we can nevertheless elect to be minimally informative about the distribution of unconstrained parameters through Σ. Coupled with weakly informative priors for Γ and z, neither the correlation between “constrained” and unconstrained nor the marginal mean of unconstrained coefficients is directly affected by informative prior specifications for $\mu ^{*}_{c}$ and V^∗.

However, the role of the prior on Γ in the implied prior for the covariance of unconstrained coefficients (see the lower right block of the covariance matrix in Eq. 11) requires additional discussion. A priori, an increasing number of constrained coefficients coupled with a diffuse prior on Γ implies a marginal prior for the variance of unconstrained coefficients that may appear as favoring larger variances. In this context, it is important to keep in mind that the variance contribution through Γ is through the covariance between “constrained” and unconstrained coefficients (see the upper right and lower left block of the covariance matrix in Eq. 11). Thus, the prior implication of large marginal variances of unconstrained coefficients stems from “mixing” over strong and qualitatively different (positive or negative) dependencies between constrained and unconstrained coefficients. However, strong dependence between “constrained” and unconstrained coefficients constitutes an extremely informative hierarchical prior. Hence, “mixing” over strong and qualitatively different (positive or negative) dependencies between constrained and unconstrained coefficients is not a possibility a posteriori, not even in small data sets. For example, even smallish data sets will enforce a choice between the two highly informative opposites of strong positive and strong negative dependence between a constrained and an unconstrained coefficient. In sum, large variances of unconstrained coefficients through Γ a posteriori result from strong dependence between “constrained” and unconstrained coefficients as per the likelihood.

Before going into more detail about suggested subjective choices, we illustrate the problem of formulating sensible priors for constrained coefficients in the smallest possible example where $\beta _{i} = -\exp (\beta _{i}^{\ast }), \beta _{i}^{\ast } \sim N({\bar \beta }^{\ast },V_{\beta ^{\ast }})$. Here, the subjective prior is on parameters ${\bar \beta }^{\ast }$ and $V_{\beta ^{\ast }}$ in the normal distribution that generates $\beta _{i}^{\ast }$. Under what is widely considered a weakly informative subjective prior setting^{Footnote 6} for ${\bar \beta }^{\ast }$ and $V_{\beta ^{\ast }}$, we obtain that a priori 25% of the constrained coefficients {β_i} are larger than − .001, i.e., very close to zero, and another 25% are smaller than − 1054 (see the right column in Table 1).

Table 1 Quantiles of marginal prior densities for a constrained coefficient with informative and standard weakly informative subjective priors

How to generalize from a hierarchical model?

Abstract

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

1 Introduction

2 Different ways of generalizations and market simulations

3 Sign and order constraints

3.1 Marginal-conditional decomposition

4 Efficient MH-sampling

5 Simulation study

5.1 Drawing from prior distributions

5.2 Population distribution and data generation

5.3 Estimates of heterogeneity

6 Preferences for fresh hen’s eggs

7 Tablet PC preferences

7.1 Predictive Performance and losses in profits

8 Discussion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A

Appendix A

1.1 A.1 Illustrating inconsistent inference for heterogeneity when based on posterior means of individual coefficients

1.2 A.2 Posterior distributions: log-normal prior

1.3 A.3 Exact Hessian of transformed variates

1.4 A.4 Illustrating the value of the proposed tuning

1.5 A.5 Numerical properties of marginal-conditional MCMC algorithm (Section 5)

1.6 A.6 Tablet PC preferences in an unconstrained model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation