Introduction

Researchers applying partial least squares (PLS), a composite-based approach to structural equation modeling (SEM), frequently consider mediating effects in their model design and estimation (e.g., Ghasemy et al. 2020; Guenther et al. 2023; Magno et al. 2022). Mediating effects assume a sequence of relationships in which an antecedent construct impacts a mediating construct which, in turn, influences a dependent construct. Examining such sequences of relationships enables substantiating the mechanisms that underlie the assumed cause-effect relationships in the path model (Nitzl et al. 2016). As a recent example, Menidjel et al. (2023) analyze the impact of costumers’ variety-seeking behavior on their service switching intention, finding that the positive effect is mediated by customer engagement.

To analyze mediating effects, researchers using PLS-SEM typically rely on Zhao et al.’s (2010) procedure, which involves contrasting indirect and direct effects in a sequence of steps to identify the existence and, if applicable, the type of the mediating effect. Several tutorial articles (e.g., Cheah et al. 2021; Nitzl et al. 2016; Sarstedt et al. 2020) and textbooks (e.g., Hair et al. 2022; Ramayah et al. 2018) document this procedure, which has become a standard in the field.

While Zhao et al.’s (2010) procedure proves useful for identifying and characterizing mediating effects, it does not offer any evidence whether the inclusion of the mediator improves the model’s quality in the first place. To answer this question, researchers in other fields have conceived mediation analysis as a type of model comparison (e.g., Ariyo et al. 2022; Crouse et al. 2022; Wiedermann & von Eye 2015) in which they compare one or more configurations of the mediation model with the more parsimonious baseline model that excludes the mediator. To do so, researchers revert to a rich array of information-theoretic model selection criteria (Lin et al. 2017) such as Akaike’s Information Criterion (AIC; Akaike 1973), which have become standard metrics in multivariate statistics. However, prior studies have pointed out that the decision for a specific model on the grounds of such criteria may yield false confidence in the results as models in the candidate set can at best be considered approximations of the data-generating process (e.g., Wagenmakers & Farrell 2004). Any model selection task comes with ambiguities in the design of the candidate set and the selection process, giving rise to model selection uncertainty (Preacher & Merkle 2012).

Addressing this concern, Rigdon et al. (2023) recently introduced a procedure to quantify this model selection uncertainty. Drawing on Akaike weights—metrics that normalize the information-theoretic model selection criteria’s values to approximate a model's posterior probability given the data (Burnham & Anderson 2002, 2004)—their procedure combines model-specific bootstrap samples to derive confidence intervals for model parameters that not only reflect sampling variance, but also the uncertainty induced by the model selection process. Researchers can draw on this approach to ascertain whether the consideration of different model configurations has the potential to decrease or bears the risk of increasing uncertainty in model estimates. Rigdon et al. (2023) evaluate and showcase their approach in standard model comparison settings where researchers explicitly hypothesize different model configurations. However, the approach’s relevance extends beyond such standard model comparisons—which researchers rarely document in their published research anyway—to much more visible modeling practices such as mediation. In addition, while Rigdon et al. (2023) introduced their approach in the context of factor-based SEM, the model selection procedure used as basis for quantifying uncertainty generalizes to composite-based methods such as PLS-SEM (e.g., Danks et al. 2020; Sharma et al. 2019, 2021).

Based on this notion and extending on Rigdon et al. (2023), this study uses a combination of Akaike weights and bootstrapping to quantify the uncertainty in parameter estimates induced by the inclusion of a mediator. The uncertainty perspective adds a new and important dimension to the evaluation of mediation models in that it offers support for the effects’ generalizability—or evidence against it (Rigdon et al. 2020, 2022). As such, the procedure may guide the decision whether or not to include a mediator in a PLS path model in situations where theory offers conflicting evidence in this regard. We document the procedure by extending a well-known model on the effects of corporate reputation (Eberl 2010), test different mediating relationships via a newly-introduced construct, and assess the effects on the uncertainty of model estimates.

Our results suggest that the inclusion of the mediator leads to a substantial decrease in the uncertainty in the corporate reputation model estimates, thereby increasing confidence in the effects. As such, our paper makes two important contributions to the literature. First, by showing how to quantify the uncertainty in applications of mediation analysis, we give PLS-SEM researchers a new tool at hand to improve the rigor of their analyses. Rather than restricting their mediation analyses to the comparison of direct and indirect effects, researchers can now draw on an uncertainty-centric approach to offer support for their inclusion of a mediator—or evidence that speaks against this step. Second, by analyzing an extended version of Eberl’s (2010) corporate reputation model, we address previous calls to consider additional mediators to clarify the mechanism through which reputation’s affective dimension impacts customer satisfaction and loyalty.

Uncertainty in comparisons of mediation models

When estimating mediating effects in PLS path models, researchers typically draw on Zhao et al. (2010). Introduced as a response to conceptual concerns regarding Barron and Kenny’s (1986) approach, Zhao et al.’s (2010) procedure involves first assessing whether the indirect effect via a mediator is significant, followed by the assessment of the direct effect between the antecedent and target constructs (Nitzl et al. 2016). Depending on whether only the indirect or also the direct effect is significant, the authors distinguish between full mediation and partial mediation, the latter of which can be further differentiated into complementary and competitive mediation. A partial mediation indicates that the mediator does not account for the entire effect of the antecedent on the target construct, suggesting that other mediators may be missing in the model. Finally, if the indirect effect is not significant, there is no mediation.

While the contrasting of direct and indirect effects proves useful for identifying whether mediation is present and, if applicable, classifying the mediation type, the application of Zhao et al.’s (2010) procedure rests on the implicit assumption that the parameter estimates’ variance can be entirely attributed to random sampling variance—provided that the mediation model is correct in the population (Cohen et al. 2003). While this assumption may be tenable in a stand-alone analysis of a mediation model, this is not the case when conceiving the mediation analysis as a model selection task where researchers compare models with and without the mediator or different types of mediating effects (e.g., simple mediation vs. serial mediation)—see, for example, Ariyo et al. (2022), Crouse et al. (2022), and Wiedermann and von Eye (2015). In this case, the models may at best be nested, but they cannot be strictly correct at the same time. Rigdon et al. (2023) argue that the violation of this assumption introduces biases in model evaluation metrics such as standard errors in that they do not fully reflect the uncertainty that comes with the model selection task.

To measure the variance in parameter estimates that can be attributed to the uncertainty of the model selection process (Preacher & Merkle 2012), Rigdon et al. (2023) suggest a four-step procedure that draws on information-theoretic model selection criteria and bootstrapping to compute uncertainty-adjusted confidence intervals of model parameters across the candidate models. Information-theoretic model selection criteria seek to strike a balance between model fit and complexity in that they identify a model that generalizes beyond the particular sample. One of the first information-theoretic model selection criteria to be proposed was the AIC (Akaike 1973), which seeks to quantify the distance between a candidate model and the (unknown) true model. One of the most prominent alternatives to the AIC is Schwarz’s (1978) Bayesian Information Criterion (BIC), which provides an estimate of the posterior probability of a model being correct. Researchers have proposed a variety of model selection criteria designed for different data constellations (small sample sizes; Hurvich and Tsai 1989) and analysis tasks (e.g., mixture regression models; Naik et al. 2007). Sharma et al. (2019, 2021) have compared the relative efficacy of various criteria in the context of PLS-SEM on the grounds of large-scale simulation studies and found that the BIC is superior in that it selects the model with (1) the highest fit among a set of candidate models that (2) also performs well in terms of out-of-sample prediction. Researchers wanting to compare different PLS path models would compute model-specific BIC values for a specific key target construct and select the model that minimizes the metric’s value.

While the BIC enables researchers to rank their models, the criterion does not actually measure the relative weights of evidence in favor of each candidate model. This drawback is especially important when BIC values differ only marginally for competing models—as it is typically the case in empirical applications (Preacher & Merkle 2012). To address this issue, researchers can compute Akaike weights (Akaike 1983) using the BIC values for each candidate model as input. Akaike weights reflect each model’s relative strength of evidence as compared to the other competing models. Considering different values BICi for m candidate models (i = 1 to m), Akaike weights can be computed as follows (Danks et al. 2020):

  1. (1)

    compute Δi = BICi – min(BIC);

  2. (2)

    compute the relative likelihood of each candidate: L(mi) = exp (−1/2·Δi);

  3. (3)

    transform the relative likelihoods into weights: wi = L(mi) / Σi L(mi).

Rigdon et al. (2023) use these Akaike weights to initialize the bootstrapping procedure, as commonly used for inference testing in applications of PLS-SEM (Sarstedt et al. 2022). Bootstrapping involves drawing a large number of samples from the original dataset with replacement, and generating an empirical distribution for the parameter estimates, thus enabling researchers to calculate the variance that incorporates both, random sampling variance and model selection uncertainty. To capture the uncertainty associated with estimating a given parameter across several competing models, Rigdon et al. (2023) propose the following procedure:

  1. (1)

    Calculate Akaike weight wi for each candidate model mi using the BICi values as input (Sharma et al. 2019, 2021).

  2. (2)

    Settle on a total number of bootstrap samples R and, for each candidate model mi: (a) draw Ri = R· wi bootstrap samples; (b) estimate the parameter for each bootstrap sample; (c) calculate the 95% confidence interval using the percentile method (Aguirre-Urreta & Rönkkö, 2018). The model estimation should consider at least R=10,000 bootstrap samples (Streukens & Leroi-Werelds 2016).Footnote 1

  3. (3)

    Combine the Ri estimates of the parameter from each model mi into a single set of R estimates, and calculate an overall 95% uncertainty interval for the parameter estimate across all candidate models.

  4. (4)

    Compare the overall uncertainty interval with the confidence intervals from each model mi.

The uncertainty interval computed following this procedure captures both model selection uncertainty and random sampling variance. A wider confidence interval as compared to the individual models’ intervals indicates uncertainty associated with the model selection.

Drawing on this procedure, researchers can establish different models with and without the mediator (potentially considering different types of mediations), compute BIC-based Akaike weights for each model, and contrast the effects’ confidence intervals in individual models with the uncertainty interval derived for the overall candidate model set. In doing so, researchers should focus on the total and direct effects of the antecedent construct on the target construct via the mediator. A wider uncertainty interval would indicate that the introduction of the mediator introduces additional uncertainty, suggesting that the mediating effect is more difficult to replicate (Rigdon et al. 2023). On the contrary, a smaller uncertainty interval suggests that the inclusion of the mediator reduced the uncertainty, which increases confidence in the effects’ stability in future investigations (e.g., Rigdon et al. 2020).

Illustrative example

Our illustration of the approach draws on an extended version of Eberl’s (2010) model on the antecedents and consequences of corporate reputation. The model has frequently been used to showcase extensions of the PLS-SEM method, for example in the context of higher-order modeling (Sarstedt et al. 2019), necessary condition analysis (Hair et al. 2024; Chapter 4), and latent class analysis (Matthews et al. 2016). The original model considers the effects of corporate reputation—operationalized by a cognitive dimension (competence) and an affective dimension (likeability)—on customer satisfaction and loyalty (Fig. 1). Albeit of secondary concern for our study, the model also considers four drivers of corporate reputation: attractiveness, corporate social responsibility, performance, and quality. Hair et al. (2022; Chapter 7) use this model to explore the mediating role of customer satisfaction, showing that this construct partially mediates the effect of likeability on customer loyalty. This result suggests that there may be a missing mediator in this relationship that the direct effect of likeability on loyalty absorbs (Fig. 1).

Fig. 1
figure 1

Corporate reputation model

Sarstedt et al. (2023) recently presented a data article, which considers trust as one potential mediator in the relationships between likeability, satisfaction, and customer loyalty. Following this notion, we test different configurations of the extended reputation model with (1) trust as a potential mediator in the relationship between likeability and customer satisfaction, (2) trust as a potential mediator in the relationship between likeability and customer loyalty, and (3) trust as a serial mediator in the relationship between likeability and customer loyalty via customer satisfaction (Fig. 2). Our analysis draws on Sarstedt et al.’s (2023) dataset of n = 308 responses from German consumers. We use the SmartPLS 4 software (Ringle et al. 2022) to estimate the models.

Fig. 2
figure 2

Alternative models

Before comparing the models, we assess the measurement models following standard procedures (e.g., Hair et al. 2019, 2020, 2022). In the following, we focus our results reporting on Model #1.Footnote 2 We find that all measures are reliable, as evidenced by, for example, RhoA values well above 0.7. Similarly, all indicator loadings are high, yielding average variance extracted values larger than 0.5, thereby providing support for the measures’ convergent validity. Computing the 90% bootstrap-based confidence intervals (percentile method, 10,0000 subsamples) shows that all HTMT values (Henseler et al. 2015) are significantly lower than 0.85, which supports discriminant validity (Franke & Sarstedt 2019) (Table 1).Footnote 3

Table 1 Measurement model assessment (model #1) 

Having established the measures’ reliability and validity, we run a mediation analysis, following the procedure outlined in Zhao et al. (2010)—see also Nitzl et al. (2016). The results from bootstrapping show that all direct and serial mediating effects in Models 1—3 via trust are significant (p < 0.05). Since the direct effects of likeability on customer satisfaction and loyalty are also significant, the mediations are partial in nature.

In the next step, we focus on the model comparison on the ground of the BIC values of the models’ target construct customer loyalty. We find that Model 2 produces the lowest BIC value (−288.034), closely followed by Model 3 (−288), and Model 1 (−286.211). Compared to the original model (−286.138), all three mediation models have lower BIC values, thereby supporting the consideration of trust as a mediator (Table 2).

Table 2 Computation of Akaike weights and number of bootstrap samples

While the absolute differences in BIC values for the candidate models are not pronounced, using these values as input for computing Akaike weights produces more nuanced differences. Models 2 and 3 have similar weights of w2 = 0.419 and w3 = 0.412, respectively, Model 1 has a much lower weight of w1 = 0.169 (Table 2).

Next, we run bootstrapping for each model with the number of samples (10,000) weighted by wi and use the combined bootstrap samples to construct combined confidence intervals. More precisely, we combined the 1685 parameter estimates resulting from bootstrapping Model #1 with the 4193 and 4122 parameter estimates from bootstrapping Models #2 and #3. Using the combined 10,000 parameter estimates, we computed 95% percentile confidence intervals for all effects related to the likeability construct. Table 3 shows the estimates for the original model, the three candidate models, and the combined effects, resulting from the consolidation of the model-specific bootstrap estimates.Footnote 4

Table 3 Confidence and (combined) uncertainty intervals

Focusing on the total effect of likeability on customer loyalty, we find that the widths of the confidence intervals in the three candidate models are highly stable, varying between 0.190 (Model 3) and 0.192 (Models 1 and 2). These values compare well with the original model where the width of the total effect’s confidence interval is 0.195. On the contrary, the combined confidence interval, expressing the uncertainty induced by the model selection task, is much narrower (0.167), which is 12.72% less than the average interval produced by Models 1–3 and 14.36% less than the original model’s interval.

The very same holds for the direct effect of likeability on customer loyalty. The widths of the confidence intervals of the three candidate models vary around 0.2, which compares well with the original model (0.196). On the contrary, the combined interval that also considers uncertainty is much narrower (0.179).

These results suggest that the different specifications of the mediator models led to a substantial decrease in uncertainty in the total and direct effects of likeability on customer loyalty. Hence, researchers can expect this effect to be more stable in future applications of the model. The additional insights generated by the uncertainty analysis speaks in favor of the inclusion of trust as a mediator—despite the marginal differences in BIC values compared to the original model without the mediator.

Discussion

Every step along the research process has the potential to contribute to the findings’ uncertainty. Researchers make countless decisions when working with theoretical frameworks, designing models, selecting measures, and collecting data—to name a few. The uncertainty introduced by such decisions may increase the variability of results, which goes well beyond mere sampling variance. Rigdon et al. (2020) argue that researchers’ disregard of uncertainty is to blame for much of the replication crisis that occupies behavioral research today. While Open Science initiatives like preregistrations, open data, and checklists (e.g., Simmons et al. 2021) may help to control for components of uncertainty, researchers need to proactively quantify uncertainty in order to successfully channel it (Rigdon & Sarstedt 2022).

This study makes one step into this direction by showcasing how to quantify uncertainty in PLS-SEM-based mediation analyses. In line with prior research (e.g., Ariyo et al. 2022; Crouse et al. 2022; Wiedermann & von Eye 2015), we conceive mediation analyses as a form of model selection, which makes them amendable to uncertainty analyses, as documented by Rigdon et al. (2023). Applying the procedure to an extended version of Eberl’s (2010) well-known corporate reputation model shows that the consideration of an additional mediator leads to a substantial decrease in uncertainty in the associated total and direct effects. These results thereby offer support for the inclusion of the mediator.

The procedure extends standard mediation analyses, which are restricted to the contrasting of direct and indirect effects, with little support whether the inclusion of the mediator adds to the model’s quality in the first place. Comparing the various models offers a basis for deriving an uncertainty interval for the mediating effects, which can readily be compared with the confidence intervals derived in each of the candidate models. This uncertainty perspective explicitly acknowledges the approximative character of model comparison tasks—models should be seen as approximations of the data-generating process, rather than strictly “correct” or “wrong” (Burnham & Anderson 2002; Sweeten 2020). Or as Cudeck and Henly (1991, p. 512) note: “Yet no model is completely faithful to the behavior under study. Models usually are formalizations of processes that are extremely complex. It is a mistake to ignore either their limitations or their artificiality. The best one can hope for is that some aspect of a model may be useful for description, prediction, or synthesis. The extent to which this is ultimately successful, more often than one might wish, is a matter of judgment.” The procedure seeks to grasp and quantify this fuzziness in the context of mediation analyses. In doing so, the procedure is very versatile in that it (1) allows for the inclusion of different types of mediating relationships (e.g., simple and serial mediations) and multiple mediators, and (2) is not restricted to PLS, but extends to other composite-based SEM methods like generalized structured component analysis (Hwang & Takane 2004) and its extensions (Hwang et al. 2021; see also Hwang et al. 2023).

Our empirical application showcases the usefulness of our approach in the context of Eberl’s (2010) widely-known corporate reputation model. The fact that the trust mediator reduces uncertainty in the likeability-related model estimates informs researchers and practitioners about its relevance for the model, thereby offering support for the construct’s inclusion. The results thereby suggest that efforts to improve corporate reputation’s likeability dimension—being the primary driver of customer satisfaction and loyalty—should also consider the trust-inducing effects of corresponding marketing activities. As such, our approach motivates a more holistic thinking and interpretation of the cause-effect relationships in nomological networks such as the corporate reputation model.

Our analysis considered the BIC as the primary information-theoretic model selection criterion in PLS-SEM that achieves a sound tradeoff between model fit and predictive power (Sharma et al. 2019, 2021). Researchers could, however, also focus on the mediator’s predictive contribution to decide whether or not to include it in the model (Danks 2021). Following this logic, researchers would focus on the increase in the model’s predictive accuracy due to the addition of the mediator as a decision rule rather than BIC values. Future research should therefore identify means to substitute Akaike weights in a purely prediction-oriented mediation framework (e.g., Chin et al. 2020; Hair & Sarstedt 2021; Sharma et al. 2022). Further research should also seek ways to quantify the change in uncertainty in all relationships involved in the mediation simultaneously, for example, by bootstrapping model fit measures such as the SRMR (Schuberth et al. 2022). Such improvements would further increase the rigor of mediation analyses, model comparisons, and model evaluation per se in a PLS-SEM framework.