1 Introduction

Endogeneity is a common methodological problem in applied statistics, especially when regression analysis is relied upon for causal inference. The problem of endogeneity occurs when (nonexogenous) explanatory variables are correlated with the error term in a regression model. The consequences of violating this assumption for ordinary least square (OLS) multivariate regression analysis are biased and inconsistent estimates (Mayston 2009). Thus, in the presence of endogeneity, hypothesis tests for regression coefficients obtained via OLS estimation can be severely misleading. However, a substantial body of empirical research based on administrative data is plagued with endogeneity issues.

The traditional remedy for endogeneity is to rely on instrumental variables (Hausman, 1983, Angrist and Krueger, 2001) and proxy variables (Levinsohn and Petrin, 2003 and Wooldridge, 2009), which are derived from additional exclusion restrictions or equations (see, e.g., Galvao and Montes-Rojas, 2010). Klein and Vella (2010) developed a control function estimator for the triangular simultaneous equations model to reduce the influence of endogeneity when the necessary exclusion restrictions are not available to create the appropriate instruments. Chalak and White (2011) provided a comprehensive description of how to conceivably identify conditional exogeneity relationships and explained how structural relations determine exogeneity and the exclusion restrictions that produce the moment restrictions that support identification. In addition, they analyzed the necessary and sufficient moment restrictions for the identification of effects using extended instrumental variables in a linear structural equation model. Lewbel (2012) proposed a method for identifying structural parameters in models with endogenous variables in the presence of heteroscedasticity. Another approach is to use a prior distribution for the endogeneity bias when additional information such as that contained in proxy variables or instrumental variables is not available (Galvao and Montes-Rojas, 2010). In Altonji et al. (2005) and Altonji et al. (2008), a similar strategy is used to extract information regarding the endogeneity bias from observables. An index of observables can be used to identify the parameter associated with the endogenous variable. Vansteelandt and Didelez (2018) proposed a double-robust estimation method to improve the efficiency of instrumental variables estimators. Song and Wang (2019) suggested generalized method of moments nonparametric correction approaches for the logit model. Birke et al. (2017) introduced an estimation method that is the solution of an ill-posed inverse problem of the parametric component of the model when endogeneity exists in the explanatory variables. Ghosh et al. (2021) derived the estimator and test procedure based on a control function method for heterogeneous endogeneity. Thus, numerous methods have been proposed in the literature to improve the estimation of the coefficients in a system of simultaneous structural equations under different regularity conditions. This paper considers several estimators, including classical two-stage least squares (TSLS), limited information maximum likelihood (LIML), and OLS estimators. TSLS is preferred to OLS because it provides consistent estimates in the presence of endogeneity, while OLS is inconsistent in such cases (Hansen, 2017). In addition, inference using TSLS is valid even when there are few instruments. However, in the presence of many instruments, LIML remains consistentFootnote 1 (Anatolyev, 2013) and is preferred to TSLS and OLS (Hahn and Inoue, 2002).

In small samples, the TSLS estimator may have larger variance than the OLS estimator. To address this issue, Maasoumi (1978) and Hansen (2017) proposed Stein-type estimators for more efficient estimation of unknown parameter vectors in simultaneous equations. The literature shows the use of Stein-type estimators in linear, nonlinear, and panel data models; for example, Sawa (1973) introduced a combined estimator (CE) that is a linear combination of OLS and TSLS, where the coefficients depend only upon the number of included and excluded variables from the relevant equation and sample size. Kim and White (2001) investigated James–Stein-type estimators in the linear regression model by choosing the least absolute deviation estimator as the base estimator and OLS as the data-driven estimator in scenarios where the sample size approaches infinity. DiTraglia (2016) proposed a focused moment selection criterion (FMSC) for the generalized method of moments. The average estimator with minimum asymptotic mean squared error and the FMSC are developed for a scalar parameter of interest, which implies that Stein-type outcomes are not applicable (DiTraglia, 2016). Wang et al. (2016) combined random effect and fixed effect estimators for linear panel data models, and Huang et al. (2020) proposed a weighted estimator for a semiparametric panel data model based on the Hausman statistic. Recent studies have shown that Stein-type estimators have been widely used in various regression models, utilizing different types of estimators. This study examines the effectiveness of the Stein k-class shrinkage estimator in the linear instrumental variables (IV) model using asymptotic tools.

LIML has optimal properties when the number of instruments and observations is also large (Anderson et al. 2010), and it is consistent and more robust to weak instruments than TSLS (Chao and Swanson, 2005; Hansen, 2021). In this article, the LIML estimator is improved by proposing an improved weighted estimator that combines the OLS and LIML estimators and depends on the Wu-Hausman statistic\(\left({H}_{n}\right)\).Footnote 2 This combination of OLS and LIML provides a risk-based rationalization for \({H}_{n}\) and leads to the Stein-weighted LIML (SWLIML) estimator. The asymptotic distribution and risk of SWLIML are derived, and a Monte Carlo simulation is conducted to show the empirical performance of SWLIML in the presence of endogeneity and many weak instruments. The SWLIML estimator is then applied to the green patent dataset from Maasoumi et al. (2021), which shows a substantial improvement in the predictive power of SWLIML compared to OLS and LIML.

The outline of this article is as follows. Section 2 defines the model, notation and LIML. An SWLIML estimator is characterized in Sect. 3. The asymptotic properties of the proposed SWLIML estimator are well defined in the unified context presented in Sect. 4, where the asymptotic distribution and asymptotic risk are derived. In Sect. 5, a Monte Carlo simulation is conducted to evaluate the performance of the estimators when facing endogeneity, multiple endogenous variables, and weak and/or many instruments. Section 6 presents empirical applications that illustrate the benefits of the proposed method. Concluding remarks are made in Sect. 7.

2 Model and notation

In this paper, a linear model is considered, and it is assumed that there is endogeneity in the model:

$${y}_{i} ={{\varvec{x}}}_{i}^{T}{\varvec{\delta}}+ {\mu }_{i}= {{\varvec{z}}}_{1i}^{T}\boldsymbol{\alpha } +{{\varvec{x}}}_{2i}^{T}{\varvec{\beta}}+ {\mu }_{i}$$
(2.1)
$${\mathbb{E}}\left[{{\varvec{z}}}_{1i}{\mu }_{i}\right]=0,{\mathbb{E}}\left[{{\varvec{x}}}_{2i}{\mu }_{i}\right]\ne 0$$

In matrix notation, model (2.1) can be written as

$${\varvec{y}} ={\varvec{X}}{\varvec{\delta}}\boldsymbol{ }+\boldsymbol{ }{\varvec{\mu}}= {{\varvec{Z}}}_{1}\boldsymbol{\alpha }\boldsymbol{ }+{{\varvec{X}}}_{2}{\varvec{\beta}}\boldsymbol{ }+\boldsymbol{ }{\varvec{\mu}},$$
(2.2)

where \({\varvec{X}}\) is the matrix of regressors, which is divided into two matrices, i.e., \({{\varvec{Z}}}_{1}\) is the \(n\times {K}_{1}\) matrix of predetermined exogenous variables, and \({{\varvec{X}}}_{2}\) is the \(n\times E\) matrix of endogenous variables. \({\varvec{\mu}}\) is the \(n\times 1\) vector of error terms, and the vector of parameters \({\varvec{\delta}}\) is divided into two sub vectors, \(\boldsymbol{\alpha }\) and \({\varvec{\beta}}.\) The parameter of interest is the endogenous regression parameter vector \({\varvec{\beta}}\). Two classical estimators for model (2.1) are OLS and LIML (Anderson and Rubin, 1949). Let \({\widehat{{\varvec{\beta}}}}_{L}\)(OLS) and \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) be the two estimators for \({\varvec{\beta}}\). In the presence of endogeneity, the LIML estimator is normally superior to OLS since OLS is inconsistent.

The asymptotic distribution of SWLIML may be attained by placing the model under a local-to-exogenous asymptotic assumption. Therefore, a reduced form equation for the endogenous variable \({{\varvec{x}}}_{2i}\) is defined as

$${{\varvec{x}}}_{2i}={{\varvec{\pi}}}_{1}^{T}{{\varvec{z}}}_{1i} + {{\varvec{\pi}}}_{2}^{T}{{\varvec{z}}}_{2i} + {{\varvec{v}}}_{i},$$
(2.3)

and the matrix form of (2.3) corresponds to

$${{\varvec{X}}}_{2}={{\varvec{Z}}}_{1}{{\varvec{\Pi}}}_{1} + {{\varvec{Z}}}_{2}{{\varvec{\Pi}}}_{2} + {\varvec{V}}$$
(2.4)

with \({{\varvec{z}}}_{i}=\left(\begin{array}{c}{{\varvec{z}}}_{1i}\\ {{\varvec{z}}}_{2i}\end{array}\right)\), where \({{\varvec{z}}}_{i}\) is a \(K\)-dimensional instrument with \(K>E\), \({{\varvec{z}}}_{2i}\) is \({K}_{2}\times 1\), and \({\mathbb{E}}\left[{{\varvec{z}}}_{i}{{\varvec{v}}}_{i}^{T}\right]=0\). Thus, we simply refer to \({{\varvec{z}}}_{1i}\) as a vector of \({K}_{1}\) exogenous variables in (2.1) and \({{\varvec{z}}}_{2i}\) as the excluded valid instruments (Anderson et al. 2010).Footnote 3

The maximum likelihood estimation of (2.1) is implicitly discussed as a part of a system along with (2.4), which is the LIML estimator. The derivation of LIML was originally derived by Anderson and Rubin (1949), and these results may be found in Davidson and MacKinnon (1993). The LIML estimator for \({\varvec{\delta}}\) can be defined as

$${\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\left({{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{y}},$$
(2.5)

where \(\widehat{\kappa }\) is the smallest eigenvalue of the matrix.

$${\left[{\mathbb{W}}^{T}{{\varvec{M}}}_{{\varvec{Z}}}{\mathbb{W}}\right]}^{-\frac{1}{2}} {\mathbb{W}}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{\mathbb{W}} {\left[{\mathbb{W}}^{T}{{\varvec{M}}}_{{\varvec{Z}}}{\mathbb{W}}\right]}^{-\frac{1}{2}},$$

which depends only on observable data and not on any unknown parameters (Davidson and MacKinnon, 2004). \({\mathbb{W}}=\left[\begin{array}{cc}{\varvec{y}}& {{\varvec{X}}}_{2}\end{array}\right]\), and \({{\varvec{M}}}_{{{\varvec{Z}}}_{1}} ={{\varvec{I}}}_{n}-{{\varvec{P}}}_{{{\varvec{Z}}}_{1}}\) is a projection of the predetermined variables included in (2.1) with \({{\varvec{P}}}_{{{\varvec{Z}}}_{1}}= {{\varvec{Z}}}_{1}{\left({{\varvec{Z}}}_{1}^{T}{{\varvec{Z}}}_{1}\right)}^{-1} {{\varvec{Z}}}_{1}^{T}\). \({{\varvec{M}}}_{{\varvec{Z}}}={{\varvec{I}}}_{n}-{{\varvec{P}}}_{{\varvec{Z}}}\) is a projection of all the instruments with \({{\varvec{P}}}_{{\varvec{Z}}}={\varvec{Z}}{\left({{\varvec{Z}}}^{T}{\varvec{Z}}\right)}^{-1}{{\varvec{Z}}}^{T}\). The matrix \({\varvec{Z}}=\left[\begin{array}{cc}{{\varvec{Z}}}_{1}& {{\varvec{Z}}}_{2}\end{array}\right]\) comprises all the instruments, and this satisfies \(\widehat{\kappa }\ge 1\) since the expression \({\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)\) cannot be smaller than the expression \({\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)}^{T}{{\varvec{M}}}_{{\varvec{Z}}}\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)\). The estimator in (2.5) with arbitrary \(\kappa \) is called a \(k\)-class estimator, which includes the following estimators as special cases:

\(\underset{k\to 0}{\mathrm{lim}}{\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(0\right)={\left({{\varvec{X}}}^{T}{\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}{\varvec{y}}\), the OLS estimator.

\(\underset{k\to 1}{\mathrm{lim}}{\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(1\right)={\left({{\varvec{X}}}^{T}{{\varvec{P}}}_{{\varvec{Z}}}{\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}{{\varvec{P}}}_{{\varvec{Z}}}{\varvec{y}}\), the TSLS estimator.

\(\underset{k\to \widehat{\kappa }}{\mathrm{lim}}{\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(\widehat{\kappa }\right)={\left({{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{y}}\), the LIML estimator.

Let the superscript “\(\perp \)” represent the residuals from the projection on \({{\varvec{Z}}}_{1}\), e.g., \({{\varvec{X}}}^{\perp }={{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}\). By using this notation, the OLS estimator of \({\varvec{\beta}}\) is \({\left({{{\varvec{X}}}^{\perp }}^{T}{{\varvec{X}}}^{\perp }\right)}^{-1}{{{\varvec{X}}}^{\perp }}^{T}{\varvec{y}}\), and therefore, the desired expression for the LIML estimator for the vector of endogenous regression parameters \({\varvec{\beta}}\) can be defined as

$${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}={\left({{{\varvec{X}}}^{\perp }}^{T}\left[{{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{{\varvec{Z}}}_{2}^{\perp }}\right]{{\varvec{X}}}^{\perp }\right)}^{-1}{{{\varvec{X}}}^{\perp }}^{T}\left[{{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{{\varvec{Z}}}_{2}^{\perp }}\right]{\varvec{y}}$$
(2.6)

Bekker (1994) proposed an efficient covariance matrix for k-class estimators with normal error terms that are more robust to weak instruments, and the variance for LIML is estimated as

$$ {\widehat{{\varvec{\Sigma }}}}_{{LIML}} = \mathbb{H}^{{ - 1}} {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\Sigma } }} \mathbb{H}^{{ - 1}} , $$

where \({\mathbb{H}}={{\varvec{X}}}^{T}{{\varvec{P}}}_{{\varvec{Z}}}{\varvec{X}}-\gamma \left({{\varvec{X}}}^{T}{\varvec{X}}\right)\), \( {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\Sigma } }} = \hat{\sigma }^{2} \left[ {\gamma ^{2} \left( {\user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }^{T} \user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }} \right) + \left( {1 - 2\gamma } \right)\user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }^{T} \user2{P}_{\user2{Z}} \user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }} \right] \), and \(\gamma \) is the smallest eigenvalue of the matrix \( \left( {\user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} }^{T} \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} }} \right)^{{ - 1}} \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} }^{T} \user2{P}_{\user2{Z}} \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} } \), where \( \user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} } = \user2{X} - \user2{\hat{\mu }}\frac{{\user2{\hat{\mu }}^{T} \user2{X}}}{{\user2{\hat{\mu }}^{T} \user2{\hat{\mu }}}} \) and \( \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} } = \left[ {\begin{array}{*{20}c} \user2{y} & \user2{X} \\ \end{array} } \right] \).

3 Stein-weighted LIML estimator

The SWLIML estimator is considered by following the ideas of Maasoumi (1978) and Hansen (2016) to see if a combination of the OLS and the LIML estimators may improve the precision of estimates of the structural parameter \({\varvec{\beta}}\). The SWLIML estimator for \({\varvec{\beta}}\), which is a weighted combination of \({\widehat{{\varvec{\beta}}}}_{L}\) and \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) that has weights that are inversely proportional to the Hausman test statistic, is defined as follows:

$${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}=\lambda {\widehat{{\varvec{\beta}}}}_{L}+\left(1-\lambda \right){\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}={\widehat{{\varvec{\beta}}}}_{L}+{\left(1-\frac{\mathcalligra{q}}{{H}_{n}}\right)}_{+}\left[{\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\widehat{{\varvec{\beta}}}}_{L}\right],$$
(3.1)

where

$$ \lambda = \left\{ {\begin{array}{*{20}c} {1,\,{\text{if}}\,H_{n} \ge \mathcalligra{q}} & {} \\ {\frac{\mathcalligra{q}}{{H_{n} }},\,{\text{if}}\,H_{n} \ge \mathcalligra{q}},\end{array} } \right.~ $$
(3.2)

and

$${H}_{n}={\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\widehat{{\varvec{\beta}}}}_{L}\right)}^{T}\widehat{{\varvec{G}}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\widehat{{\varvec{\beta}}}}_{L}\right),$$
(3.3)

where \(\lambda \in \left[\mathrm{0,1}\right]\), \(\mathcalligra{q}\) is the shrinkage parameter, \({H}_{n}\) is a Hausman–Wu statistic,\(\widehat{{\varvec{G}}}={\left[var\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)-var\left({\widehat{{\varvec{\beta}}}}_{L}\right)\right]}^{-1}\), and \((\varkappa)_{+} \) is a positive trimming function such that \((\varkappa)_{+}= (\varkappa)\) if \((\varkappa)>0\) and \(0\) if \((\varkappa)\le 0\) in the positive part of the James–Stein estimator. Assuming the null hypothesis for\({H}_{n}\), it is considered that both \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) and \({\widehat{{\varvec{\beta}}}}_{L}\) estimators are consistent, but the latter is more efficient. However, under the alternative hypothesis, \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) is consistent, while \({\widehat{{\varvec{\beta}}}}_{L}\) is not, as it does not account for endogeneity as effectively. Therefore, the discrepancy between these two estimators \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) and \({\widehat{{\varvec{\beta}}}}_{L}\) provides a reliable measure of endogeneity. The restriction \(\lambda \in \left[\mathrm{0,1}\right]\) that is implicit in (3.2) is similar to the “positive part” restriction of the classic James–Stein estimator (Baranchik 1970). The degree of shrinkage depends on the ratio\(\frac{\mathcalligra{q}}{{H}_{n}}\). A high value of the test statistic \({H}_{n}\) indicates endogeneity of the regressor \({{\varvec{x}}}_{2i}\), whereas a low value suggests exogeneity\(.\) Therefore, \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\) gives more weight to \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) when there is evidence of endogeneity; otherwise, \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\) gives more weight to\({\widehat{{\varvec{\beta}}}}_{L}\). \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{L}\) when \({H}_{n}\) is sufficiently small (less than \(\mathcalligra{q}\)). Note that when \({H}_{n}\ge \mathcalligra{q}\), then \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\) is a weighted average shrinkage estimator of \({\widehat{{\varvec{\beta}}}}_{L}\) and\({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\). Different values of \(k\) and \(\lambda \) in (3.1) produce different types of estimators, and the derivation of these estimators can be observed from the following properties:

  1. 1.

    \(\mathop{{lim}}\limits_{\begin{subarray}{c}{k\to 0}\\{\lambda \to 1}\end{subarray}}{\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{L}\), the OLS estimator.

  2. 2.

    \(\mathop{{lim}}\limits_{\begin{subarray}{c}{k\to 0}\\{\lambda \to 1}\end{subarray}}{\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{LIML}\), the LIML estimator.

  3. 3.

    \(\underset{k\to 1}{\mathrm{lim}}{\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{SWTSLS}=\lambda {\widehat{{\varvec{\beta}}}}_{L}+\left(1-\lambda \right){\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\), the SWTSLS estimator.

Based on Hausman’s statistic, a pretest estimator (PT) is defined as follows:

$${\widehat{{\varvec{\beta}}}}_{PT}=I{\left({H}_{n}\right)}_{\left[0,{c}_{\alpha }\right.)}{\widehat{{\varvec{\beta}}}}_{L}+I{\left({H}_{n}\right)}_{\left[{c}_{\alpha },\infty \right.)}{\widehat{{\varvec{\beta}}}}_{LIML},$$

where \({c}_{\alpha }\) is the nominal level of the chi-square distribution \(\left({\chi }_{E}^{2}\right)\) with \(E\) degrees of freedom and \(I{\left({H}_{n}\right)}_{\left[0,{c}_{\alpha }\right.)}\) is an indicator function, which takes a value of 1 when the test statistic value falls within the interval \(\left[a,b\right.)\) and 0 otherwise. This estimator aims to replicate the standard procedure used by researchers to select between models, which involves relying on statistical tests.

The proposed estimator is a flexible Stein k-class IV estimator that can be expanded to incorporate other IV estimators, such as the generalized method of moments and modified LIML (Fuller 1977). Furthermore, incorporating a modification to account for heteroskedasticity and/or serial correlation would be advantageous. The implementation of these enhancements would hold significant value for future research.

4 Asymptotic properties of the SWLIML estimator

In this section, the asymptotic distribution and risk of the SWLIML estimator under a local asymptotic framework are derived by following Wang et al. (2016). To establish the asymptotic distribution of \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\), consider the reduced form Eq. (2.3) for the endogenous variables \({{\varvec{x}}}_{2i}\). The instrumental variable assumption requires that \({\mathbb{E}}\left[{{\varvec{z}}}_{i}{{\varvec{v}}}_{i}^{T}\right]=0\). Therefore, we write the linear projection of the structural equation error \({\mu }_{i}\) onto \({{\varvec{v}}}_{i}\)

$${\mu }_{i}={{\varvec{v}}}_{i}^{T}{\varvec{\rho}}+{\xi }_{i}$$
(4.1)
$${\mathbb{E}}\left[{{\varvec{v}}}_{i}{\xi }_{i}\right]=\varvec{0},$$

(4.1) and (2.3) are written without loss of generality, as these equations may be characterized by projection. Particularly, it is assumed that \({\varvec{\rho}}\) in (4.1) is local to zeroFootnote 4

$${\varvec{\rho}}=\frac{1}{\sqrt{n}}{\varvec{\eta}},$$
(4.2)

where \({\varvec{\eta}}\) is the \(E\times 1\) localizing parameter for the degree of correlation between \({\mu }_{i}\) and \({{\varvec{v}}}_{i}\). Thus, \({{\varvec{x}}}_{2i}\) is exogenous when \({\varvec{\eta}}=\varvec{0}\) and endogenous when \({\varvec{\eta}}\ne \varvec{0} \). \(\frac{1}{\sqrt{n}}\) scaling is used in (4.2) since it gives the stability necessary to derive a joint distribution approximation of OLS and LIML. It is helpful to begin by stating the regularity conditions for the asymptotic distribution of \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\). Set \({\mathbb{E}}\left[{{\varvec{\nu}}}_{i}{{\varvec{\nu}}}_{i}^{T}\right]={{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}\), \({\mathbf{Q}}_{\mathbf{z}\mathbf{z}}={\mathbb{E}}\left[{\mathbf{z}}_{i}{\mathbf{z}}_{i}^{T}\right],\) \({\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}={\mathbb{E}}\left[{\mathbf{z}}_{1i}{\mathbf{z}}_{1i}^{T}\right],\) \({\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}={\mathbb{E}}\left[{{\varvec{x}}}_{2i}{{\varvec{x}}}_{2i}^{T}\right]\), \({\mathbf{Q}}_{{\mathbf{x}}_{2}\mathbf{z}}={\mathbb{E}}\left[{{\varvec{x}}}_{2i}{\mathbf{z}}_{i}^{T}\right]\), \({\mathbf{Q}}_{\mathbf{z}{\mathbf{x}}_{2}}={\mathbb{E}}\left[{{\varvec{z}}}_{i}{{\varvec{x}}}_{2i}^{T}\right]\), \({\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}={\mathbb{E}}\left[{{\varvec{x}}}_{2i}{\mathbf{z}}_{1i}^{T}\right]\), \({\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}={\mathbb{E}}\left[{{\varvec{z}}}_{1i}{{\varvec{x}}}_{2i}^{T}\right]\), and \({\sigma }^{2}={\mathbb{E}}\left[{\xi }_{i}^{2}\right]\) to simplify the asymptotic properties of the variance–covariance matrix. Following Huang et al. (2020), we illustrate the regularity conditions.

Assumption 1

  1. 1.

    \(\left({y}_{i},{{\varvec{x}}}_{i},{{\varvec{z}}}_{i},{\mu }_{i},{{\varvec{\nu}}}_{i}: i=1,...,n\right)\) are independently and identically distributed.

  2. 2.

    \({\mathbb{E}}{\Vert {\xi }_{i}\Vert }^{4}<\infty \), \({\mathbb{E}}{\Vert {{\varvec{\nu}}}_{i}\Vert }^{4}<\infty \) and \({\mathbb{E}}{\Vert {{\varvec{z}}}_{i}\Vert }^{4}<\infty \).

  3. 3.

    \(\mathrm{rank}\left[{{\varvec{\Pi}}}_{2}\right]=E\), \({{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}\) and \({\mathbf{Q}}_{\mathbf{z}\mathbf{z}}\) are positive definite.

  4. 4.

    \(\left({\xi }_{i}^{2}|{{\varvec{z}}}_{i},{{\varvec{x}}}_{2i}\right)={\sigma }^{2}\); \({\sigma }^{2}{\left(\mathrm{plim}\frac{{{\varvec{X}}}_{2}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}}{n}\right)}^{-1}={{\varvec{V}}}_{{\varvec{\beta}}}^{L}\) and \({\sigma }^{2}{\left(\mathrm{plim}\frac{{{\varvec{X}}}_{2}^{T}\left({{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){{\varvec{X}}}_{2}}{n}\right)}^{-1}={{\varvec{V}}}_{{\varvec{\beta}}}^{LIML}\), where plim represents the probability limit operator as \(n\to \infty \).

Assumption 1.1 is a basic assumption in the literature that specifies that the observations are i.i.d. Assumption 1.2 states that the error terms and instruments have finite fourth moments. This is used to draw on the central limit theorem (CLT). Assumption 1.3 requires that the usual identification assumption is satisfied \(\left(\mathrm{e}.\mathrm{g}.,\mathrm{ rank}\left[{{\varvec{\Pi}}}_{2}\right]=E\right)\) so that the coefficient on the endogenous variable is identified, and the full rank condition on \({{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}\) is equivalent to supposing that \({{\varvec{x}}}_{2i}\) and \({{\varvec{z}}}_{i}\) have no common elements. In assumption 1.4, we first make a conditional homoscedasticity assumption on the error \({\xi }_{i}\) given the instruments and regressors, and second, assume that the asymptotic variances of the OLS and LIML estimators simplify under a conditional homoscedasticity condition.

Theorem 4.1

When assumptions 1.1–1.4 and Eq. (4.1) hold, the following convergence results hold jointly:

  1. (i)

    \({n}^{1/2}\left(\begin{array}{c}{\widehat{{\varvec{\beta}}}}_{L}-{\varvec{\beta}}\\ {\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}\end{array}\right)\stackrel{d}{\to }{\varvec{\psi}}+\zeta \), where \({\varvec{\psi}}=\left(\begin{array}{c}\frac{1}{{\sigma }^{2}}{{\varvec{C}}}_{0}{\boldsymbol{\Sigma }}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}\\ 0\end{array}\right)\) and \(\zeta \sim N\left(0,{\varvec{C}}\right)\) with \({\varvec{C}}=\left[\begin{array}{cc}{{\varvec{C}}}_{0}& {{\varvec{C}}}_{0}\\ {{\varvec{C}}}_{0}& {{\varvec{C}}}_{1}\end{array}\right]\), \({{\varvec{C}}}_{0}={\sigma }^{2}{\left[{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right]}^{-1}\) and, \({{\varvec{C}}}_{1}={\sigma }^{2}{\left[{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}\left({{\varvec{I}}}_{n}-\widehat{\kappa }\right)+\widehat{\kappa }\left({\mathbf{Q}}_{{\mathbf{x}}_{2}\mathrm{z}}{{\varvec{Q}}}_{\mathbf{z}\mathbf{z}}^{-1}{\mathbf{Q}}_{\mathbf{z}{\mathbf{x}}_{2}}\right)-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right]}^{-1}\);

  2. (ii)

    \({H}_{n}\stackrel{d}{\to }\varphi ={\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\), where \({\varvec{\Omega}}={\mathcal{A}\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}{\mathcal{A}}^{T}\) with \(\mathcal{A}={\left(\begin{array}{cc}-{{\varvec{I}}}_{n}& {{\varvec{I}}}_{n}\end{array}\right)}^{T}\) and \({\mathcal{A}}_{2}={\left(\begin{array}{cc} \varvec{0} & {{\varvec{I}}}_{n}\end{array}\right)}^{T}\);

  3. (iii)

    \({n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}-{\varvec{\beta}}\right)\stackrel{d}{\to }\Phi ={\mathcal{A}}_{2}^{T}\zeta -min\left(\frac{\mathcalligra{q}}{\varphi },1\right){\mathcal{A}}^{T}\left({\varvec{\psi}}+\zeta \right).\)

Proof

The proof of Theorem 4.1 (i) is straightforward and standard under the homoscedasticity assumption (Wang et al. 2016). Part (ii) holds once the estimators for the covariance matrices are consistent. Part (iii) follows from the continuous mapping theorem. Let \({{\varvec{\psi}}}_{I}={\mathcal{A}}_{1}{\varvec{\psi}}\), \({{\varvec{\psi}}}_{II}={\mathcal{A}}_{2}{\varvec{\psi}}\), \({\zeta }_{I}={\mathcal{A}}_{1}\zeta \) and \({\zeta }_{II}={\mathcal{A}}_{2}\zeta .\) The proof is simple under (4.2) and assuming homoscedasticity. First, we outline the convergence of the OLS estimator to show that

$${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{L}-{\varvec{\beta}}\right)={\sigma }^{-2}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+{n}^{-1/2}{\left({\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right)}^{-1}{\mathbb{E}}\left({{\varvec{x}}}_{2i}\left({{\varvec{I}}}_{n}-{{\varvec{P}}}_{{{\varvec{z}}}_{1}}\right)\right){\xi }_{i}\stackrel{d}{\to }{{\varvec{\psi}}}_{I}+{\zeta }_{I}$$

with \({\zeta }_{I}\sim {\left(\mathrm{plim}\frac{1}{n}{{\varvec{X}}}_{2}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}\right)}^{-1}{\varvec{T}},\) and

$${\varvec{T}}=\frac{1}{\sqrt{n}}{{\varvec{X}}}_{2}^{T}\left({{\varvec{I}}}_{n}-{{\varvec{P}}}_{{{\varvec{Z}}}_{1}}\right){\varvec{\mu}}\sim N\left(\varvec{0},{{\sigma }^{2}\left(\mathrm{plim}\frac{1}{n}{{\varvec{X}}}_{2}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}\right)}^{-1}\right)$$

Hence, \({\zeta }_{I}\sim N\left(\varvec{0}, {\varvec{C}_0}\right)\). Second, to show the convergence of the LIML:

$${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}={\left({{\varvec{X}}}_{2}^{T}\left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]{{\varvec{X}}}_{2}\right)}^{-1}{{\varvec{X}}}_{2}^{T}\left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]\left({{\varvec{v}}}^{T}{\varvec{\rho}}+\xi \right)={\left(\frac{1}{n}{{\varvec{X}}}_{2}^{T}\left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]{{\varvec{X}}}_{2}\right)}^{-1} \left({\frac{1}{n}{{\varvec{X}}}_{2}^{T} \left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]\xi} \right)$$
$${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}\right)\stackrel{d}{\to }{\zeta }_{II}\sim N\left(\varvec{0},{\sigma }^{2}{\left(\mathrm{plim}\frac{1}{n}{{\varvec{X}}}_{2}^{T}\left({{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){{\varvec{X}}}_{2}\right)}^{-1}\right).$$

The convergence in Theorem 4.1 (ii) follows directly from the consistency of the variance–covariance estimates, and (iii) follows from the continuous mapping theorem as in Theorem 1 of Hansen (2017). \(\hfill\square \)

Theorem 4.1 describes the joint asymptotic distribution of \({\widehat{{\varvec{\beta}}}}_{L}\) and \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\), which is normally distributed, and of \({H}_{n}\) and \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\) under the local exogeneity assumption. When \({\varvec{\eta}}\ne \varvec{0}\), \({\widehat{{\varvec{\beta}}}}_{L}\) is asymptotically biased, and \({H}_{n}\) has a noncentral chi-square distribution with noncentrality parameter \({\varvec{\psi}}\). Finally, the asymptotic distribution of the SWLIML estimator converges to a nonlinear function of the normal random vector in Theorem 4.1 (i), which is a function of \({\varvec{\psi}}\).

Let \({\mathcal{B}}_{n}\) be any estimator of \({\varvec{\beta}}\) and let \(\mathcal{W}\) be a weight matrix to calculate the risk of different estimators for a simultaneous linear model in the presence of endogeneity. The notion of asymptotic risk is described as.

$${\mathbb{R}}\left({\mathcal{B}}_{n}\right)=\underset{\mathfrak{s}\to \infty }{\mathrm{lim}}\underset{n\to \infty }{\mathrm{lim inf}}{\mathbb{E}}\left[\mathrm{min}\left\{n{\Vert {\mathcal{B}}_{n}-{\varvec{\beta}}\Vert }_{\mathcal{W}}^{2},\mathfrak{s}\right\}\right],$$

where \(\mathfrak{s}\) denotes the truncated parameter, which controls the expected scaled loss function. If any estimator has an asymptotic distribution

$${n}^{1/2}\left({\mathcal{B}}_{n}-{\varvec{\beta}}\right)\stackrel{d}{\to }\Psi $$

for some random variable \(\Psi \), then the definition of asymptotic risk is useful, as it is simple to analyze. Therefore, by following Lehmann and Casella (1998), the asymptotic risk is computed as follows:

$${\mathbb{R}}\left({\mathcal{B}}_{n}\right)={\mathbb{E}}\left[{\Psi }^{T}\mathcal{W}\Psi \right]=\mathrm{tr}\left[\mathcal{W}{\mathbb{E}}\left(\Psi {\Psi }^{T}\right)\right]$$
(4.3)

To make comparisons of different estimators easier, we set \(\mathcal{W}={\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}\) since it simplifies the analysis. Lemma 1 is stated and used in the proof of Theorem 4.2 (Stein, 1981).

Lemma 1

If \(\zeta \sim N\left(0,{\varvec{C}}\right)\) is \(E\times 1\), \({\mathbb{K}}\) is \(E\times E\), and \(\mathrm{\rm H}\left(\boldsymbol{\varkappa }\right):{\mathbb{R}}^{E}\to {\mathbb{R}}^{E}\) is continuous, then

$${\mathbb{E}}\left[\mathrm{\rm H}{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}\zeta \right]={\mathbb{E}}tr\left[\frac{\partial }{\partial \boldsymbol{\varkappa }}{\mathrm{\rm H}\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}\right].$$

Proof

Let \({\phi }_{{\varvec{C}}}\left(\boldsymbol{\varkappa }\right)\) denote the normal density function with mean 0 and variance–covariance matrix \({\varvec{C}}\).

By multivariate integration by parts,

$${\mathbb{E}}\left[H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}\zeta \right]=\int H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}{{\varvec{C}}}^{-1}\boldsymbol{\varkappa }{\phi }_{{\varvec{C}}}\left(\boldsymbol{\varkappa }\right)\left(\mathbf{d}\boldsymbol{\varkappa }\right)=\int \mathrm{tr}\left[\frac{\partial }{\partial \boldsymbol{\varkappa }}H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}\right]{\phi }_{{\varvec{C}}}\left(\boldsymbol{\varkappa }\right)\left(\mathbf{d}\boldsymbol{\varkappa }\right)={\mathbb{E}}\mathrm{tr}\left[\frac{\partial }{\partial \boldsymbol{\varkappa }}H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}\right] \hfill\square $$

Theorem 4.2

Under assumptions 1.1–1.4 and Eq. (4.1), if \(\mathrm{E}>2\) and \(0<\mathcalligra{q}\le 2\left(\mathrm{E}-2\right)\), then

  1. (i)

    \({\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{LIML}\right)=tr\left[\mathcal{W}{{\varvec{C}}}_{1}\right]\) and \({\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWLIML}^{*}\right)=tr\left[\mathcal{W}{\mathbb{E}}\left(\Psi {\Psi }^{T}\right)\right]\), where \(\mathcal{W}={\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}\);

  2. (ii)

    \({\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWLIML}^{*}\right)<{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{LIML}\right)-\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{\boldsymbol{\Sigma }}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{\boldsymbol{\Sigma }}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad (4.4)\)

Proof

We know that \({n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}\right)\stackrel{d}{\to }{\mathcal{A}}_{2}^{T}\zeta \sim N\left(0,{{\varvec{C}}}_{1}\right)\) from Theorem 4.1. Therefore, the asymptotic risk of \({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\) can be written as

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)={\mathbb{E}}\left[{\zeta }^{T}{\mathcal{A}}_{2}^{T}\mathcal{W}{\mathcal{A}}_{2}\zeta \right]=\mathrm{tr}\left[\mathcal{W}{\mathbf{C}}_{1}\right]$$

Now

$${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}-{\varvec{\beta}}\right)\stackrel{d}{\to }\Psi ,$$

where \(\Psi \) indicates the limiting random variable. The equivalent random variable \(\left({\Psi }^{*}\right)\) is illustrated without trimming its positive part.

$${\Psi }^{*}={\mathcal{A}}_{2}^{T}\zeta -\left(\frac{\mathcalligra{q}}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right){\mathcal{A}}^{T}\left({\varvec{\psi}}+\zeta \right).$$

Therefore, it can take advantage of the statement that the pointwise quadric risk of \(\Psi \) is strictly smaller than that of \({\Psi }^{*}\)

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)={\mathbb{E}}\left({\Psi }^{T}\mathcal{W}\Psi \right)<{\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right).$$

The right-hand side of the inequality is calculated by using the expression for the random variable \({\Psi }^{*}\), as shown below:

$${\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right)={\mathbb{E}}\left[{\zeta }^{T}{\mathcal{A}}_{2}^{T}\mathcal{W} {\mathcal{A}}_{2}\zeta \right]+{\mathcalligra{q}}^{2}{\mathbb{E}}\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\mathcal{A}}_{2}^{T}\mathcal{W} {\mathcal{A}}_{2}\left({\varvec{\psi}}+\zeta \right)}{{\left({\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right)}^{2}}\right]-2\mathcalligra{q}{\mathbb{E}}\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}\zeta }{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right],=\mathrm{tr}\left(\mathcal{W}{{\varvec{C}}}_{1}\right)+{\mathcalligra{q}}^{2}{\mathbb{E}}\left[\frac{1}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]-2\mathcalligra{q}{\mathbb{E}}\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}\zeta }{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right],$$

where

\({\mathcal{A}}_{2}^{\mathrm{T}}\mathbf{C}\mathcal{A}={{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}={\mathcal{W}}^{-1}\) and \(\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{\mathrm{T}}\mathbf{C}{\varvec{\Omega}}={\mathcal{A}}^{\mathrm{T}}\mathcal{W}{\mathcal{A}}_{2}^{\mathrm{T}}\mathbf{C}\mathcal{A}\mathcal{W}{\mathcal{A}}^{\mathrm{T}}={\varvec{\Omega}}\),

and make use of Lemma 1:

$$\eta \left(\boldsymbol{\varkappa }\right)=\frac{\boldsymbol{\varkappa }}{{\boldsymbol{\varkappa }}^{T}{\varvec{\Omega}}\boldsymbol{\varkappa }} ,$$
$$\frac{\partial }{\partial \boldsymbol{\varkappa }}H{\left(\boldsymbol{\varkappa }\right)}^{T}=\left(\frac{1}{{\boldsymbol{\varkappa }}^{T}{\varvec{\Omega}}\boldsymbol{\varkappa }}\right){\varvec{I}}-\frac{2}{{\left({\boldsymbol{\varkappa }}^{T}{\varvec{\Omega}}\boldsymbol{\varkappa }\right)}^{2}}{\varvec{\Omega}}{\boldsymbol{\varkappa }}^{T}\boldsymbol{\varkappa }.$$

Consequently,

$$\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}\zeta }{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]={\mathbb{E}}\mathrm{tr}\left[\frac{\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}-\frac{2\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}}{{\left({\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right)}^{2}}{\varvec{\Omega}}{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right],={\mathbb{E}}\mathrm{tr}\left[\frac{tr\left(\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}\right)}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]-2{\mathbb{E}}\mathrm{tr}\left[\frac{\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}}{{\left({\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right)}^{2}}{\varvec{\Omega}}{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right]={\mathbb{E}}\left[\frac{E-2}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right].$$

Hence, \({\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right)={\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)+{\mathcalligra{q}}^{2}{\mathbb{E}}\left[\frac{1}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]-2\mathcalligra{q}{\mathbb{E}}\left[\frac{E-2}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right].\)

Then, making use of Jensen’s inequality and applying the assumption that \(\mathcalligra{q}\le 2\left(E-2\right),\)

$${\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right)\le {\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)-\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{\mathbb{E}}\left[{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right]}$$
(4.5)

As \(\mathrm{tr}\left({\varvec{C}}{\varvec{\Omega}}\right)=\mathrm{tr}\left(\mathcal{W}{\mathcal{A}}^{\mathrm{T}}\mathbf{C}\mathcal{A}\right)=E\), we have

$${\mathbb{E}}\left[{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right]={{\varvec{\psi}}}^{T}{\varvec{\Omega}}{\varvec{\psi}}+\mathrm{tr}\left({\varvec{C}}{\varvec{\Omega}}\right)={{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+\mathrm{tr}\left(\mathcal{W}{\mathcal{A}}^{\mathrm{T}}\mathbf{C}\mathcal{A}\right).$$

Now, (4.5) can be simplified as follows:

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)\le {\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)-\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}\quad\quad\quad\quad \hfill\square $$

Theorem 4.2 holds under the mild regularity conditions in assumptions 1.1–1.4 and Eq. (4.1), and Theorem 4.2 (ii) indicates that the asymptotic risk of \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\) is strictly less than the asymptotic risk of \({\widehat{{\varvec{\beta}}}}_{LIML}\) when the number of endogenous variables exceeds 2. As the inequality in Theorem 4.2 (ii) is stringent and is satisfied for all values of \({\varvec{\eta}}\), the proposed SWLIML estimator strictly dominates the classical LIML estimator when \(E>2\). The assumption that the number of endogenous variables exceeds 2 is comparable to Stein’s (1956) and Srivastava and Bilodeau’s (1989) estimators, as a shrinkage dimension greater than two is required for shrinkage to lead to global reductions in the risk function. Furthermore, it can be observed that the \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\) estimator substantially decreases the risk function in situations where the degree of endogeneity is low, the number of endogenous variables \(\left({{\varvec{x}}}_{2i}\right)\) is large, and the instruments are weak. The expression \(\left(2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}\right)/\left({{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E\right)\) is minimized when the shrinkage parameter is estimated as \(\widehat{\mathcalligra{q}}=E-2\), where \(E\) denotes the number of endogenous variables, and thus the inequality \(0<\mathcalligra{q}\le 2\left(E-2\right)\) is satisfied when \(E>2\). There is no compelling method for selecting the value of \(\mathcalligra{q}\) when \(E<3\); therefore, the shrinkage estimator is proposed by motivating the idea of Toker (2020), Amin et al. (2020) and Asl et al. (2021). We set the right-hand side of the inequality in terms of absolute value, \(0<\mathcalligra{q}\le \left|2\left(E-2\right)\right|\), and suggest the shrinkage estimator, \(\widehat{\mathcalligra{q}}=1/\left|2-\left(E-2\right)\right|\), to estimate the shrinkage parameter.

4.1 Asymptotic risk reduction

In certain situations, the number of instruments may increase with the sample size but still be smaller than the sample size. In such cases, the LIML estimator is consistent, whereas the TSLS is inconsistent, as demonstrated in previous studies, such as those by Bekker (1994) and Chao and Swanson (2005). To evaluate the extent to which SWLIML and SWTSLS improve the level of risk, it is first necessary to define the risk bounds of both estimators. The risk of SWLIML is defined in Theorem 4.2 (ii), and it can be expressed as

$$\frac{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)}{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)}\le 1 -\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}$$
(4.6)

For the SWTSLS estimator, its risk satisfies the bound

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWTSLS}\right)\le {\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\right) -\frac{\mathcalligra{q}\left(2\left(E-2\right)-\mathcalligra{q}\right)}{{{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E},$$
(4.7)

where \({\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\right)=\mathrm{tr}\left({\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{2}\right)\) and \({{\varvec{C}}}_{2}={\sigma }^{2}{\left({\mathbf{Q}}_{{\mathbf{x}}_{2}\mathrm{z}}{{\varvec{Q}}}_{\mathbf{z}\mathbf{z}}^{-1}{\mathbf{Q}}_{\mathbf{z}{\mathbf{x}}_{2}}-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right)}^{-1}\), as demonstrated in the study by Hansen (2017). When comparing the bounds of SWLIML (4.4) and SWTSLS (4.7), it is evident that neither estimator uniformly dominates the other. This indicates that there are certain risk magnitudes where the SWLIML estimator has a lower risk, while the SWTSLS estimator has a lower risk in some cases. These bounds can be utilized to identify the regions where one estimator performs better than the other. This information helps in selecting the most appropriate estimator for a particular scenario.

To see the magnitude of risk improvement for SWTSLS and SWLIML, the following parameters should be defined as follows:

  1. (i)

    \({d}_{E}=\frac{\mathrm{tr}\left({\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{1}\right)}{E}\) and \({h}_{E}=\frac{{{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}}{E}\) for SWTSLS

  2. (ii)

    \({d}_{E}^{*}=\frac{\mathrm{tr}\left(\mathcal{W}{{\varvec{C}}}_{1}\right)}{E}\) and \({h}_{E}^{*}=\frac{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}}{E}\) for SWLIML.

The parameters \({d}_{E}\) and \({d}_{E}^{*}\) are nonlinear functions of the correlation between \({{\varvec{x}}}_{2i}\) and instruments \({{\varvec{z}}}_{2i}\), and both satisfy \({d}_{E}\), \({d}_{E}^{*}\ge 1\). Conversely, the parameters \({h}_{E}\) and \({h}_{E}^{*}\) serve as indicators of the magnitude of endogeneity \({\varvec{\eta}}\). The result of Eq. (4.7) and (4.6) can be done

$$\frac{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWTSLS}\right)}{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\right)}\le 1 -\frac{{\mathcalligra{q}}^{2}}{\mathrm{tr}\left({\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{2}\right)\left({\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E\right)}\simeq 1 -\frac{1}{{d}_{E}\left({h}_{E}+1\right)}$$
(4.8)

and

$$\frac{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)}{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)}\le 1 -\frac{\mathcalligra{q}}{\mathrm{tr}\left(\mathcal{W}{{\varvec{C}}}_{1}\right)}\frac{\mathcalligra{q}}{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}\simeq 1 -\frac{1}{{d}_{E}^{*}\left({h}_{E}^{*}+1\right)}$$
(4.9)

respectively, with approaches equality as the value of \(E\) increases.

Equation (4.8) illustrates that SWTSLS achieves a percentage reduction in asymptotic risk compared to TSLS, which is approximately \(100/\left({d}_{E}\left({h}_{E}+1\right)\right)\), while (4.9) shows that SWLIML achieves an approximate percentage reduction in risk of approximately \(100/\left({d}_{E}^{*}\left({h}_{E}^{*}+1\right)\right)\) compared to LIML. It is observed that the percentage reduction in asymptotic risk due to the Stein k-class IV estimator is highest when the values of the parameters \({d}_{E},{h}_{E},{d}_{E}^{*}\) and \({h}_{E}^{*}\) are small. The strength and size of the instruments determine the parameters \({d}_{E}\) and \({d}_{E}^{*}\), while the magnitude of endogeneity determines the increasing values of \({h}_{E}\) and \({h}_{E}^{*}\). In cases where there are too many weak instruments, the parameter \({d}_{E}\) is inflated, while \({d}_{E}^{*}\) approaches one. Therefore, the SWLIML estimator is expected to have a smaller asymptotic risk than SWTSLS, or the bound (4.4) is expected to be smaller than the bound (4.7). In certain situations where the instruments are extremely weak and the endogeneity is mild to moderate (\({h}_{E}\) is small), the bound (4.7) can be smaller than (4.4). When there are many strong instruments and a high level of endogeneity, the bound (4.4) can be much smaller than (4.7) since the parameter \({d}_{E}\) is inflated, while \({d}_{E}^{*}\to 1\). However, when the degree of endogeneity is mild, the difference between (4.7) and (4.4) is marginal.

The theory suggests that depending on the specific scenario, either the SWLIML estimator or the SWTSLS estimator may outperform the other. Specifically, when the number of instruments is high and the degree of endogeneity is moderate to high, the SWLIML estimator is expected to have lower asymptotic risk than the SWTSLS estimator. On the other hand, when there are not too many instruments and the degree of endogeneity is small, the SWTSLS estimator is expected to have lower risk. Therefore, we anticipate that the Stein k-class IV estimator will result in significant risk reductions, especially when the instruments are weak, the degree of endogeneity is mild to high, and/or the number of instruments is large. To gain a better understanding, we examine simulation estimates in the following section to determine which estimator performs better than the other.

5 Monte Carlo design

The simulation is conducted by following Hahn and Inoue (2002), Kuersteiner and Okui (2010) and Hansen (2017). The observations \(\left({y}_{i},{{\varvec{x}}}_{i},{{\varvec{z}}}_{i},{\mu }_{i},{{\varvec{\nu}}}_{i}; i=1,...,n\right)\) are produced by considering a simultaneous equations model with endogenous variables:

$${y}_{i} ={{\varvec{x}}}_{i}^{T}{\varvec{\beta}}+\alpha + {\mu }_{i},$$
$${{\varvec{x}}}_{i}={{\varvec{\Pi}}}^{T}{{\varvec{z}}}_{i} +\gamma + {{\varvec{v}}}_{i},$$

where \({{\varvec{z}}}_{i}\stackrel{i.i.d}{\sim } N\left(0,{{\varvec{I}}}_{K}\right)\) are the instruments, \(\left(\begin{array}{c}{\mu }_{i}\\ {{\varvec{v}}}_{i}\end{array}\right)\stackrel{i.i.d}{\sim }N\left(0,{\varvec{\Omega}}\right)\) with \({\varvec{\Omega}}=\left[\begin{array}{cccc}1& {\rho }^{*}& \dots & {\rho }^{*}\\ {\rho }^{*}& 1& 0& 0\\ \vdots & 0& \ddots & 0\\ {\rho }^{*}& 0& 0& 1\end{array}\right],\) \({\rho }^{*}=\rho /\sqrt{E}\) is the correlation between \({\mu }_{i}\) and \({{\varvec{v}}}_{i}\), which measures the degree of endogeneity, and \(\rho =0\) reveals that there is no endogeneity issue. The determinant of \({\varvec{\Omega}}\) is \({1-\rho }^{2}\), which is positive iff \(\left|\rho \right|<1\). Therefore, we symbolize the correlation \(\left[\rho /\sqrt{E}; \rho \in \left(-1, 1\right)\right]\), which is a natural parameterization. The integers \(E\) and \(K\) are the total number of endogenous variables and instruments, respectively. The parameter of interest is the parameter vector \({\varvec{\beta}}\), which is set to \({\left(-1, 1\right)}^{T}\), \({\left(-1, 1,-\mathrm{1,1}\right)}^{T}\) and \({\left(-1, 1,-\mathrm{1,1},-\mathrm{1,1}\right)}^{T}\) when \(E=2\), \(E=4\), and \(E=6\), respectively, and parameters \(\alpha \) and \(\gamma \) are set to zero. The reduced form matrix \({\varvec{\Pi}}\) is set according to the concentration parameter (CP), which is a unitless measure of the strength of the instruments (Stock et al. 2002). The CP can be specified for multiple endogenous variables as \({{\varvec{\Sigma}}}_{{\varvec{v}}{\varvec{v}}}^{-1/2}{{\varvec{\Pi}}}^{T}{\mathbf{Z}}^{T}\mathbf{Z}{\varvec{\Pi}}{{\varvec{\Sigma}}}_{{\varvec{v}}{\varvec{v}}}^{-1/2}.\) According to Stock et al. (2002), the CP plays an important role since it determines the strength of the instruments, and there is also a one-to-one correspondence between the \({R}^{2}\) value and CP. For \({\varvec{\Pi}}\), a \(K\times E\) matrix is defined, where the first \(E\) rows and \(E\) columns of Π are diagonal with , while all other elements of the matrix are equal to zero and is a scalar, set to \(={\left(\frac{{R}^{2}}{1-{R}^{2}}\right)}^{1/2}\) so that \({R}^{2}\) is the reduced form population R-squared for each \({x}_{ji}\). In the design of the experiment,Footnote 5 the values of \(\left({R}^{2},n, \rho ,K, E\right)\) are selected from the specifications below:

$${R}^{2}\in \left\{0.01, 0.10, 0.30\right\}; n\in \left\{100, 500, 1000, 2000\right\}; \rho \in \left\{0, 0.25, 0.50, 0.75, 0.90\right\}; K\in \left\{10, 30, 60\right\};E\in \left\{2, 4, 6\right\}$$

The parameter \({R}^{2}\) signifies the strength of the instruments, and \({R}^{2}=0.01\) indicates weak instruments (Donald and Newey, 2001). n represents the sample size, \(\rho \) controls the degree of endogeneity (the regressor is exogenous when \(\rho =0\) and a large value of \(\rho \) indicates severe endogeneity). Note that the dimension of \({{\varvec{z}}}_{i}\) is greater than that of \({{\varvec{x}}}_{i}\) \(\left(K>E\right)\), so the model is overidentified. LIML is equivalent to TSLS when the model is identified, and therefore, the overidentified model is used to examine the performance of the Stein k-class IV estimator. OLS, TSLS, LIML, CE (Sawa, 1973), SWTSLS, SWLIML and pretest estimators are examined for each of the simulated results, which are based on 5000 replications. The performance of each estimator is compared by examining the median squared error, i.e., for any estimator \(\widehat{{\varvec{\beta}}}\),

$$M\left(\widehat{{\varvec{\beta}}}\right)=median\left[{\left(\widehat{{\varvec{\beta}}}-{\varvec{\beta}}\right)}^{T}\left(\widehat{{\varvec{\beta}}}-{\varvec{\beta}}\right)\right]$$

As in Donald et al. (2009), we report the robust measures of the median squared errorFootnote 6 instead of the mean squared error for each estimator. This is because TSLS and LIML estimators may not have finite variances in finite samples. Additionally, the use of the median squared error is common in the simultaneous model literature.

5.1 Monte Carlo results

To simplify the presentation, the simulated relative efficiencies (SREs) are reported as a measure of performance. The SREs are defined as the ratio of the simulated median squared error of any of the estimators relative to the simulated median squared error of \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}\). The SRE of any estimator \({\widehat{{\varvec{\beta}}}}^{*}\) is characterized as

$$\mathrm{SRE}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}},{\widehat{{\varvec{\beta}}}}^{*}\right)=\frac{median\left[{\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}-{\varvec{\beta}}\right)}^{T}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}-{\varvec{\beta}}\right)\right]}{median\left[{\left({\widehat{{\varvec{\beta}}}}^{*}-{\varvec{\beta}}\right)}^{T}\left({\widehat{{\varvec{\beta}}}}^{*}-{\varvec{\beta}}\right)\right]},$$

where \({\widehat{{\varvec{\beta}}}}^{*}\) is the \({\widehat{{\varvec{\beta}}}}_{L}\), \({\widehat{{\varvec{\beta}}}}_{TSLS}\), \({\widehat{{\varvec{\beta}}}}_{LIML}\), \({\widehat{{\varvec{\beta}}}}_{CE}\), \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWTSLS}}^{*}\), or \({\widehat{{\varvec{\beta}}}}_{PT}\) estimator. Note that when \(\mathrm{SRE}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}},{\widehat{{\varvec{\beta}}}}^{*}\right)<1\), it specifies that the performance of \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}\) is improved relative to other estimators. On the other hand, if \(\mathrm{SRE}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}},{\widehat{{\varvec{\beta}}}}^{*}\right)>1\), it suggests poor performance of \({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}\). The SREs of the estimators are presented in Tables 1, 2, 3, 4. Table 1 corresponds to the scenario where \(E=2\) and \(K=30\), Table 2 pertains to the situation where \(E=2\) and \(K=60\), Table 3 is the case when \(E=4\) and \(K=30\), and Table 4 shows the SREs for the case where \(E=4\) and \(K=60\). In the supplementary materials, Tables I–IV present the SREs of the estimators for various cases, including the situations where many variables are considered to address different empirically relevant problems. The main findings of the simulation study are summarized below:

Table 1 aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for \(E=2,\) \(K=30,\) and \(\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90\)
Table 2 aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for \(E=2,\) \(K=60,\) and \(\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90\)
Table 3 aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for \(E=4,\) \(K=30,\) and \(\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90\)
Table 4 aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for \(E=4,\) \(K=60,\) and \(\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90\)
  • Monte Carlo simulation results indicate that the performance of the proposed Stein k-class IV estimator is satisfactory relative to the CE, OLS, TSLS, and LIML estimators in the presence of moderate to strong endogeneity.

  • The median squared error of the estimators substantially decreases by increasing the sample size, but as the number of endogenous variables increases, the median squared error also increases. The Stein k-class IV estimator performs best in the case of weak instruments and endogeneity, while OLS, TSLS, LIML and CE perform worst for all parameter values when the degree of endogeneity is high.

  • In the case of no endogeneity \(\left(\rho =0\right)\), the OLS estimator has a lower median squared error than the other estimators, but this ranking is reversed in the presence of endogeneity. Considering a case of weak instruments \(\left({R}^{2}=0.01\right)\) and a lower degree of endogeneity, \(\left(\rho =0.25\right)\), the OLS estimator has the lowest dispersion compared to the estimators, particularly for small sample sizes, and these results are similar to those of Hansen (2017).

  • Looking across the tables where \(E=2\), it is clear that SWLIML dominates for moderate to large values of \(\rho \), and its performance is greatly enhanced when there are many instruments.

  • In most cases, OLS, TSLS, LIML, CE, and SWTSLS have a lower SRE than SWLIML, and therefore, the performance of the proposed estimator is more robust to many weak instruments and intermediate to large values of \(\rho \). The SREs of the TSLS and SWTSLS are greater than one for \({R}^{2}=0.01\), \(n=100\) and \(E=4\) in a few cases.

  • In the supplementary materials, Table I shows that SWLIML has a lower median squared error than the other estimators when \(\rho \) is large, while SWTSLS, TSLS and CE have better performance for small \(\rho \). The SWTSLS exhibits a lower median squared error than TSLS for small values of \(\rho \), but the opposite is true for large \(\rho \), as shown in Table II.

  • When \(K\) is small and \(E=6\), in most cases, SWTSLS performs marginally better than SWLIML but significantly outperforms other estimators, except for TSLS in the case of large values of \(\rho \). Conversely, for a large value of \(K\), SWLIML exhibits a significantly superior performance to other existing estimators, as demonstrated by Tables IV and V.

Figure 1 presents the relative median squared error of the estimators for \(\rho \), which varies from 0 to 1, \(n=500\) and \({R}^{2}=0.10\), where \({\alpha }^{*}=0.05, 0.15, 0.25\) represents the minimum and maximum assured efficiency of the Pretest estimator (Kibria and Saleh, 2006). In addition, Figs. a−h are provided in the supplementary materials to depict real data analysis situations with more covariates in the model.

  • The median squared error of the pretest estimators and OLS is smaller than that of the other estimators when the values of \(\rho \) are small. The performance of the pretest is generally similar to that of OLS for low \(\rho \), and in many cases, it is similar to that of LIML when \(\rho \) is high. However, the SRE of Pretest estimators is naturally lower than LIML for intermediate values of \(\rho \).

  • When \({R}^{2}=0.10\) and \(\rho \) take moderate values, the SRE of SWTSLS and TSLS is higher than that of SWLIML, but for large \({R}^{2}\), the SRE is much lower than that of SWLIML.

  • The improvements from the SWLIML estimator are most significant under large values of \(\rho \) and many instruments.

Fig. 1
figure 1

Relative median squared errors of the OLS, TSLS, LIML, CE, SWTSLS and pretest estimators a when \(K=30\) and \(E=2\), b when \(K=60\) and \(E=2\), c when \(K=30\) and \(E=4\), and d when \(K=60\) and \(E=4\)

Overall, based on the Monte Carlo simulation evidence, the improvements made by the SWLIML estimator relative to the SWTSLS, CE, TSLS, LIML, and OLS estimators are greatest in the presence of many instruments and moderate to large values of \(\rho \). The SWTSLS method performs better when the number of instruments and degree of endogeneity are low and the number of endogenous variables is high. However, as the level of endogeneity increases, TSLS may have a higher SRE due to overidentification, reducing the benefits of shrinkage. Sawa (1973) proposed a CE method where the weight depends on the number of exogenous variables, and this method has a significantly lower SRE when the number of instruments is high. Additionally, when the model is just identified, the performance of CE is almost identical to that of TSLS. When there are more than two endogenous variables, the inequality in Theorem 4.2 is satisfied, and SWLIML uniformly dominates LIML. When there are fewer than three endogenous variables, SWLIML has either a smaller SRE than LIML or a slightly greater SRE. The promising results from the simulation suggest that SWLIML should be utilized in situations where there is a need to estimate a linear IV model with many instruments.

6 Application: green patents

The empirical relevance of the proposed method is also demonstrated by analyzing the green patentFootnote 7 dataset from Maasoumi et al. (2021). The data cover the period 1990–2016, and the total sample includes 140 observations from Nordic countries (Denmark, Finland, Iceland, Norway and Sweden). The primary purpose for this dataset is to facilitate investigations of the impact of regulation and strategies on green patent production and the advancement of renewable energy technologies in Nordic countries. In addition, the influence of environmental stringency, environmental taxes and policies related to renewable energy patents are considered. More examples of innovative energy strategies and different public environmental policies can be found in Popp (2002), Faber and Hesen (2004), Taylor (2016), and others. Heshmati et al. (2015) and Schiederig et al. (2012) discussed the growth of renewable energy and green innovation, respectively, in detail. The literature on renewable energy and the factors influencing the use of technology to achieve the United Nations Sustainable Development Goals is growing. In a recent study, Maasoumi et al. (2021) examined the impact of regulations and policies on green patent generation and the evolution of renewable energy technologies in the Organization for Economic Co-operation and Development (OECD) countries. Therefore, analyzing these conflicts of interest is highly relevant, which is why this specific dataset is especially useful.

Green research and development investment (GRDI), research and development personnel (RDP) and government expenditure on tertiary education (GETE) might be endogenous variables; therefore, both the OLS and the k-class estimators are examined. A detailed description of the variables can be found in Maasoumi et al. (2021), while a brief description follows. The dependent variable is green patents. The instruments used are variables that measure the total capital stock (measured in billions of $); gross domestic product (GDP); openness, which is the sum of the exports’ and imports’ share of GDP; a financial index; an environmental index; and an emissions index. The parameters of interest are the regression coefficients on the endogenous variables (i.e., GRDI, RDP and GETE). We have demonstrated through Monte Carlo simulations and theoretical evidence that the newly proposed SWLIML estimator is superior to the other estimators. Therefore, the OLS, LIML and SWLIML estimators are used to analyze this dataset. The Durbin–Wu–Hausman test (3.3) is used to test the hypothesis that the variables GRDI, RDP and GETE are exogenous. The test statistic value of \({H}_{n}\) is 36.423, which indicates that there is an endogeneity issue in this dataset on the Nordic countries. However, Maasoumi et al. (2021) evidently reject the hypothesis that endogenous explanatory variables are present in the OECD countries dataset.

Bootstrapping is used to obtain the standard errors of the estimates and the root mean squared error (RMSE). The data are bootstrapped B = 50,000 times with resampling across individuals (Huang et al. 2020). The results, which are estimates of the endogenous variables, i.e., GRDI, RDP and GETE, and the RMSE of the estimators, are shown in Table 5. The SWLIML estimates are based on the weights for the OLS and LIML estimators. The coefficient on GRDI using the OLS estimator is negative, which is not what Maasoumi et al. (2021) found. The impact of GRDI on green patents should be positive, and it is positive when the LIML and SWLIML estimators are used to estimate the corresponding coefficients. The other variables (RDP and GETE) are found to have positive effects on green patents, which is similar to the results of Maasoumi et al. (2021). Furthermore, the bootstrapped standard errors for the SWLIML estimator are less than those for the LIML estimator irrespective of the magnitude of the estimated coefficients. The RMSE values for the OLS, LIML and SWLIML estimators are 3.154, 2.341 and 2.112, respectively. The SWLIML estimator has a smaller RMSE than the OLS and LIML estimators, and the quality of its predictions is much better. This indicates the stability of the SWLIML estimator and confirms the results of the simulation that showed that SWLIML has lower SREs in finite samples. Based on the simulated results and those from this application, we conclude that the proposed SWLIML estimator performs better than the other estimators. Consequently, we recommend that applied researchers use our proposed new SWLIML estimator.

Table 5 Estimates for the green patent dataset

7 Conclusions

This paper considers a SWLIML estimator for estimating the regression coefficients from the reduced form of a set of simultaneous equations. It is shown that the SWLIML estimator is superior to the LIML since it possesses reduced risk, in particular, a reduction in the asymptotic mean squared error. Monte Carlo simulation results show that the SWLIML estimator is preferable or is close to the optimal estimator, which is a major improvement considering the uncertainty that exists when choosing between OLS and LIML, specifically regarding the stability of the asymptotic mean square error in the presence of endogeneity and many instruments. The SWLIML estimator also dominates the SWTSLS estimator when there are many instruments, and it is more robust to weak/many instruments than the TSLS estimator. The use of SWLIML enables applied researchers to conduct efficient estimation in the presence of endogeneity and many instruments.