A weighted average limited information maximum likelihood estimator

Qasim, Muhammad

doi:10.1007/s00362-023-01485-2

A weighted average limited information maximum likelihood estimator

Regular Article
Open access
Published: 07 October 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Statistical Papers Aims and scope Submit manuscript

A weighted average limited information maximum likelihood estimator

Download PDF

Muhammad Qasim¹

858 Accesses
Explore all metrics

Abstract

In this article, a Stein-type weighted limited information maximum likelihood (LIML) estimator is proposed. It is based on a weighted average of the ordinary least squares (OLS) and LIML estimators, with weights inversely proportional to the Hausman test statistic. The asymptotic distribution of the proposed estimator is derived by means of local-to-exogenous asymptotic theory. In addition, the asymptotic risk of the Stein-type LIML estimator is calculated, and it is shown that the risk is strictly smaller than the risk of the LIML under certain conditions. A Monte Carlo simulation and an empirical application of a green patent dataset from Nordic countries are used to demonstrate the superiority of the Stein-type LIML estimator to the OLS, two-stage least squares, LIML and combined estimators when the number of instruments is large.

Assessing the Performance of Estimators Dealing with Measurement Errors

Feasible generalized least squares for panel data with cross-sectional and serial correlations

Article 09 November 2020

Estimation in a semi-varying coefficient model for panel data with fixed effects

Article 07 June 2014

1 Introduction

Endogeneity is a common methodological problem in applied statistics, especially when regression analysis is relied upon for causal inference. The problem of endogeneity occurs when (nonexogenous) explanatory variables are correlated with the error term in a regression model. The consequences of violating this assumption for ordinary least square (OLS) multivariate regression analysis are biased and inconsistent estimates (Mayston 2009). Thus, in the presence of endogeneity, hypothesis tests for regression coefficients obtained via OLS estimation can be severely misleading. However, a substantial body of empirical research based on administrative data is plagued with endogeneity issues.

The traditional remedy for endogeneity is to rely on instrumental variables (Hausman, 1983, Angrist and Krueger, 2001) and proxy variables (Levinsohn and Petrin, 2003 and Wooldridge, 2009), which are derived from additional exclusion restrictions or equations (see, e.g., Galvao and Montes-Rojas, 2010). Klein and Vella (2010) developed a control function estimator for the triangular simultaneous equations model to reduce the influence of endogeneity when the necessary exclusion restrictions are not available to create the appropriate instruments. Chalak and White (2011) provided a comprehensive description of how to conceivably identify conditional exogeneity relationships and explained how structural relations determine exogeneity and the exclusion restrictions that produce the moment restrictions that support identification. In addition, they analyzed the necessary and sufficient moment restrictions for the identification of effects using extended instrumental variables in a linear structural equation model. Lewbel (2012) proposed a method for identifying structural parameters in models with endogenous variables in the presence of heteroscedasticity. Another approach is to use a prior distribution for the endogeneity bias when additional information such as that contained in proxy variables or instrumental variables is not available (Galvao and Montes-Rojas, 2010). In Altonji et al. (2005) and Altonji et al. (2008), a similar strategy is used to extract information regarding the endogeneity bias from observables. An index of observables can be used to identify the parameter associated with the endogenous variable. Vansteelandt and Didelez (2018) proposed a double-robust estimation method to improve the efficiency of instrumental variables estimators. Song and Wang (2019) suggested generalized method of moments nonparametric correction approaches for the logit model. Birke et al. (2017) introduced an estimation method that is the solution of an ill-posed inverse problem of the parametric component of the model when endogeneity exists in the explanatory variables. Ghosh et al. (2021) derived the estimator and test procedure based on a control function method for heterogeneous endogeneity. Thus, numerous methods have been proposed in the literature to improve the estimation of the coefficients in a system of simultaneous structural equations under different regularity conditions. This paper considers several estimators, including classical two-stage least squares (TSLS), limited information maximum likelihood (LIML), and OLS estimators. TSLS is preferred to OLS because it provides consistent estimates in the presence of endogeneity, while OLS is inconsistent in such cases (Hansen, 2017). In addition, inference using TSLS is valid even when there are few instruments. However, in the presence of many instruments, LIML remains consistent^{Footnote 1} (Anatolyev, 2013) and is preferred to TSLS and OLS (Hahn and Inoue, 2002).

In small samples, the TSLS estimator may have larger variance than the OLS estimator. To address this issue, Maasoumi (1978) and Hansen (2017) proposed Stein-type estimators for more efficient estimation of unknown parameter vectors in simultaneous equations. The literature shows the use of Stein-type estimators in linear, nonlinear, and panel data models; for example, Sawa (1973) introduced a combined estimator (CE) that is a linear combination of OLS and TSLS, where the coefficients depend only upon the number of included and excluded variables from the relevant equation and sample size. Kim and White (2001) investigated James–Stein-type estimators in the linear regression model by choosing the least absolute deviation estimator as the base estimator and OLS as the data-driven estimator in scenarios where the sample size approaches infinity. DiTraglia (2016) proposed a focused moment selection criterion (FMSC) for the generalized method of moments. The average estimator with minimum asymptotic mean squared error and the FMSC are developed for a scalar parameter of interest, which implies that Stein-type outcomes are not applicable (DiTraglia, 2016). Wang et al. (2016) combined random effect and fixed effect estimators for linear panel data models, and Huang et al. (2020) proposed a weighted estimator for a semiparametric panel data model based on the Hausman statistic. Recent studies have shown that Stein-type estimators have been widely used in various regression models, utilizing different types of estimators. This study examines the effectiveness of the Stein k-class shrinkage estimator in the linear instrumental variables (IV) model using asymptotic tools.

LIML has optimal properties when the number of instruments and observations is also large (Anderson et al. 2010), and it is consistent and more robust to weak instruments than TSLS (Chao and Swanson, 2005; Hansen, 2021). In this article, the LIML estimator is improved by proposing an improved weighted estimator that combines the OLS and LIML estimators and depends on the Wu-Hausman statistic$\left({H}_{n}\right)$.^{Footnote 2} This combination of OLS and LIML provides a risk-based rationalization for ${H}_{n}$ and leads to the Stein-weighted LIML (SWLIML) estimator. The asymptotic distribution and risk of SWLIML are derived, and a Monte Carlo simulation is conducted to show the empirical performance of SWLIML in the presence of endogeneity and many weak instruments. The SWLIML estimator is then applied to the green patent dataset from Maasoumi et al. (2021), which shows a substantial improvement in the predictive power of SWLIML compared to OLS and LIML.

The outline of this article is as follows. Section 2 defines the model, notation and LIML. An SWLIML estimator is characterized in Sect. 3. The asymptotic properties of the proposed SWLIML estimator are well defined in the unified context presented in Sect. 4, where the asymptotic distribution and asymptotic risk are derived. In Sect. 5, a Monte Carlo simulation is conducted to evaluate the performance of the estimators when facing endogeneity, multiple endogenous variables, and weak and/or many instruments. Section 6 presents empirical applications that illustrate the benefits of the proposed method. Concluding remarks are made in Sect. 7.

2 Model and notation

In this paper, a linear model is considered, and it is assumed that there is endogeneity in the model:

$${y}_{i} ={{\varvec{x}}}_{i}^{T}{\varvec{\delta}}+ {\mu }_{i}= {{\varvec{z}}}_{1i}^{T}\boldsymbol{\alpha } +{{\varvec{x}}}_{2i}^{T}{\varvec{\beta}}+ {\mu }_{i}$$

(2.1)

$${\mathbb{E}}\left[{{\varvec{z}}}_{1i}{\mu }_{i}\right]=0,{\mathbb{E}}\left[{{\varvec{x}}}_{2i}{\mu }_{i}\right]\ne 0$$

In matrix notation, model (2.1) can be written as

$${\varvec{y}} ={\varvec{X}}{\varvec{\delta}}\boldsymbol{ }+\boldsymbol{ }{\varvec{\mu}}= {{\varvec{Z}}}_{1}\boldsymbol{\alpha }\boldsymbol{ }+{{\varvec{X}}}_{2}{\varvec{\beta}}\boldsymbol{ }+\boldsymbol{ }{\varvec{\mu}},$$

(2.2)

where ${\varvec{X}}$ is the matrix of regressors, which is divided into two matrices, i.e., ${{\varvec{Z}}}_{1}$ is the $n\times {K}_{1}$ matrix of predetermined exogenous variables, and ${{\varvec{X}}}_{2}$ is the $n\times E$ matrix of endogenous variables. ${\varvec{\mu}}$ is the $n\times 1$ vector of error terms, and the vector of parameters ${\varvec{\delta}}$ is divided into two sub vectors, $\boldsymbol{\alpha }$ and ${\varvec{\beta}}.$ The parameter of interest is the endogenous regression parameter vector ${\varvec{\beta}}$. Two classical estimators for model (2.1) are OLS and LIML (Anderson and Rubin, 1949). Let ${\widehat{{\varvec{\beta}}}}_{L}$(OLS) and ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ be the two estimators for ${\varvec{\beta}}$. In the presence of endogeneity, the LIML estimator is normally superior to OLS since OLS is inconsistent.

The asymptotic distribution of SWLIML may be attained by placing the model under a local-to-exogenous asymptotic assumption. Therefore, a reduced form equation for the endogenous variable ${{\varvec{x}}}_{2i}$ is defined as

$${{\varvec{x}}}_{2i}={{\varvec{\pi}}}_{1}^{T}{{\varvec{z}}}_{1i} + {{\varvec{\pi}}}_{2}^{T}{{\varvec{z}}}_{2i} + {{\varvec{v}}}_{i},$$

(2.3)

and the matrix form of (2.3) corresponds to

$${{\varvec{X}}}_{2}={{\varvec{Z}}}_{1}{{\varvec{\Pi}}}_{1} + {{\varvec{Z}}}_{2}{{\varvec{\Pi}}}_{2} + {\varvec{V}}$$

(2.4)

with ${{\varvec{z}}}_{i}=\left(\begin{array}{c}{{\varvec{z}}}_{1i}\\ {{\varvec{z}}}_{2i}\end{array}\right)$, where ${{\varvec{z}}}_{i}$ is a $K$-dimensional instrument with $K>E$, ${{\varvec{z}}}_{2i}$ is ${K}_{2}\times 1$, and ${\mathbb{E}}\left[{{\varvec{z}}}_{i}{{\varvec{v}}}_{i}^{T}\right]=0$. Thus, we simply refer to ${{\varvec{z}}}_{1i}$ as a vector of ${K}_{1}$ exogenous variables in (2.1) and ${{\varvec{z}}}_{2i}$ as the excluded valid instruments (Anderson et al. 2010).^{Footnote 3}

The maximum likelihood estimation of (2.1) is implicitly discussed as a part of a system along with (2.4), which is the LIML estimator. The derivation of LIML was originally derived by Anderson and Rubin (1949), and these results may be found in Davidson and MacKinnon (1993). The LIML estimator for ${\varvec{\delta}}$ can be defined as

$${\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\left({{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{y}},$$

(2.5)

where $\widehat{\kappa }$ is the smallest eigenvalue of the matrix.

$${\left[{\mathbb{W}}^{T}{{\varvec{M}}}_{{\varvec{Z}}}{\mathbb{W}}\right]}^{-\frac{1}{2}} {\mathbb{W}}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{\mathbb{W}} {\left[{\mathbb{W}}^{T}{{\varvec{M}}}_{{\varvec{Z}}}{\mathbb{W}}\right]}^{-\frac{1}{2}},$$

which depends only on observable data and not on any unknown parameters (Davidson and MacKinnon, 2004). ${\mathbb{W}}=\left[\begin{array}{cc}{\varvec{y}}& {{\varvec{X}}}_{2}\end{array}\right]$, and ${{\varvec{M}}}_{{{\varvec{Z}}}_{1}} ={{\varvec{I}}}_{n}-{{\varvec{P}}}_{{{\varvec{Z}}}_{1}}$ is a projection of the predetermined variables included in (2.1) with ${{\varvec{P}}}_{{{\varvec{Z}}}_{1}}= {{\varvec{Z}}}_{1}{\left({{\varvec{Z}}}_{1}^{T}{{\varvec{Z}}}_{1}\right)}^{-1} {{\varvec{Z}}}_{1}^{T}$. ${{\varvec{M}}}_{{\varvec{Z}}}={{\varvec{I}}}_{n}-{{\varvec{P}}}_{{\varvec{Z}}}$ is a projection of all the instruments with ${{\varvec{P}}}_{{\varvec{Z}}}={\varvec{Z}}{\left({{\varvec{Z}}}^{T}{\varvec{Z}}\right)}^{-1}{{\varvec{Z}}}^{T}$. The matrix ${\varvec{Z}}=\left[\begin{array}{cc}{{\varvec{Z}}}_{1}& {{\varvec{Z}}}_{2}\end{array}\right]$ comprises all the instruments, and this satisfies $\widehat{\kappa }\ge 1$ since the expression ${\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)$ cannot be smaller than the expression ${\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)}^{T}{{\varvec{M}}}_{{\varvec{Z}}}\left({\varvec{y}}-{{\varvec{X}}}_{2}{\varvec{\beta}}\right)$. The estimator in (2.5) with arbitrary $\kappa $ is called a $k$-class estimator, which includes the following estimators as special cases:

$\underset{k\to 0}{\mathrm{lim}}{\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(0\right)={\left({{\varvec{X}}}^{T}{\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}{\varvec{y}}$, the OLS estimator.

$\underset{k\to 1}{\mathrm{lim}}{\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(1\right)={\left({{\varvec{X}}}^{T}{{\varvec{P}}}_{{\varvec{Z}}}{\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}{{\varvec{P}}}_{{\varvec{Z}}}{\varvec{y}}$, the TSLS estimator.

$\underset{k\to \widehat{\kappa }}{\mathrm{lim}}{\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(k\right)={\widehat{{\varvec{\delta}}}}_{\mathrm{LIML}}\left(\widehat{\kappa }\right)={\left({{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{X}}\right)}^{-1}{{\varvec{X}}}^{T}\left({{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){\varvec{y}}$, the LIML estimator.

Let the superscript “$\perp $” represent the residuals from the projection on ${{\varvec{Z}}}_{1}$, e.g., ${{\varvec{X}}}^{\perp }={{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}$. By using this notation, the OLS estimator of ${\varvec{\beta}}$ is ${\left({{{\varvec{X}}}^{\perp }}^{T}{{\varvec{X}}}^{\perp }\right)}^{-1}{{{\varvec{X}}}^{\perp }}^{T}{\varvec{y}}$, and therefore, the desired expression for the LIML estimator for the vector of endogenous regression parameters ${\varvec{\beta}}$ can be defined as

$${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}={\left({{{\varvec{X}}}^{\perp }}^{T}\left[{{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{{\varvec{Z}}}_{2}^{\perp }}\right]{{\varvec{X}}}^{\perp }\right)}^{-1}{{{\varvec{X}}}^{\perp }}^{T}\left[{{\varvec{I}}}_{n}-\widehat{\kappa }{{\varvec{M}}}_{{{\varvec{Z}}}_{2}^{\perp }}\right]{\varvec{y}}$$

(2.6)

Bekker (1994) proposed an efficient covariance matrix for k-class estimators with normal error terms that are more robust to weak instruments, and the variance for LIML is estimated as

$$ {\widehat{{\varvec{\Sigma }}}}_{{LIML}} = \mathbb{H}^{{ - 1}} {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\Sigma } }} \mathbb{H}^{{ - 1}} , $$

where ${\mathbb{H}}={{\varvec{X}}}^{T}{{\varvec{P}}}_{{\varvec{Z}}}{\varvec{X}}-\gamma \left({{\varvec{X}}}^{T}{\varvec{X}}\right)$, $ {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\Sigma } }} = \hat{\sigma }^{2} \left[ {\gamma ^{2} \left( {\user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }^{T} \user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }} \right) + \left( {1 - 2\gamma } \right)\user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }^{T} \user2{P}_{\user2{Z}} \user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} }} \right] $, and $\gamma $ is the smallest eigenvalue of the matrix $ \left( {\user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} }^{T} \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} }} \right)^{{ - 1}} \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} }^{T} \user2{P}_{\user2{Z}} \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} } $, where $ \user2{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{X} } = \user2{X} - \user2{\hat{\mu }}\frac{{\user2{\hat{\mu }}^{T} \user2{X}}}{{\user2{\hat{\mu }}^{T} \user2{\hat{\mu }}}} $ and $ \user2{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{X} } = \left[ {\begin{array}{*{20}c} \user2{y} & \user2{X} \\ \end{array} } \right] $.

3 Stein-weighted LIML estimator

The SWLIML estimator is considered by following the ideas of Maasoumi (1978) and Hansen (2016) to see if a combination of the OLS and the LIML estimators may improve the precision of estimates of the structural parameter ${\varvec{\beta}}$. The SWLIML estimator for ${\varvec{\beta}}$, which is a weighted combination of ${\widehat{{\varvec{\beta}}}}_{L}$ and ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ that has weights that are inversely proportional to the Hausman test statistic, is defined as follows:

$${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}=\lambda {\widehat{{\varvec{\beta}}}}_{L}+\left(1-\lambda \right){\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}={\widehat{{\varvec{\beta}}}}_{L}+{\left(1-\frac{\mathcalligra{q}}{{H}_{n}}\right)}_{+}\left[{\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\widehat{{\varvec{\beta}}}}_{L}\right],$$

(3.1)

where

$$ \lambda = \left\{ {\begin{array}{*{20}c} {1,\,{\text{if}}\,H_{n} \ge \mathcalligra{q}} & {} \\ {\frac{\mathcalligra{q}}{{H_{n} }},\,{\text{if}}\,H_{n} \ge \mathcalligra{q}},\end{array} } \right.~ $$

(3.2)

and

$${H}_{n}={\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\widehat{{\varvec{\beta}}}}_{L}\right)}^{T}\widehat{{\varvec{G}}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\widehat{{\varvec{\beta}}}}_{L}\right),$$

(3.3)

where $\lambda \in \left[\mathrm{0,1}\right]$, $\mathcalligra{q}$ is the shrinkage parameter, ${H}_{n}$ is a Hausman–Wu statistic,$\widehat{{\varvec{G}}}={\left[var\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)-var\left({\widehat{{\varvec{\beta}}}}_{L}\right)\right]}^{-1}$, and $(\varkappa)_{+} $ is a positive trimming function such that $(\varkappa)_{+}= (\varkappa)$ if $(\varkappa)>0$ and $0$ if $(\varkappa)\le 0$ in the positive part of the James–Stein estimator. Assuming the null hypothesis for${H}_{n}$, it is considered that both ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ and ${\widehat{{\varvec{\beta}}}}_{L}$ estimators are consistent, but the latter is more efficient. However, under the alternative hypothesis, ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ is consistent, while ${\widehat{{\varvec{\beta}}}}_{L}$ is not, as it does not account for endogeneity as effectively. Therefore, the discrepancy between these two estimators ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ and ${\widehat{{\varvec{\beta}}}}_{L}$ provides a reliable measure of endogeneity. The restriction $\lambda \in \left[\mathrm{0,1}\right]$ that is implicit in (3.2) is similar to the “positive part” restriction of the classic James–Stein estimator (Baranchik 1970). The degree of shrinkage depends on the ratio$\frac{\mathcalligra{q}}{{H}_{n}}$. A high value of the test statistic ${H}_{n}$ indicates endogeneity of the regressor ${{\varvec{x}}}_{2i}$, whereas a low value suggests exogeneity$.$ Therefore, ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$ gives more weight to ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ when there is evidence of endogeneity; otherwise, ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$ gives more weight to${\widehat{{\varvec{\beta}}}}_{L}$. ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{L}$ when ${H}_{n}$ is sufficiently small (less than $\mathcalligra{q}$). Note that when ${H}_{n}\ge \mathcalligra{q}$, then ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$ is a weighted average shrinkage estimator of ${\widehat{{\varvec{\beta}}}}_{L}$ and${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$. Different values of $k$ and $\lambda $ in (3.1) produce different types of estimators, and the derivation of these estimators can be observed from the following properties:

1.
$\mathop{{lim}}\limits_{\begin{subarray}{c}{k\to 0}\\{\lambda \to 1}\end{subarray}}{\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{L}$, the OLS estimator.
2.
$\mathop{{lim}}\limits_{\begin{subarray}{c}{k\to 0}\\{\lambda \to 1}\end{subarray}}{\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{LIML}$, the LIML estimator.
3.
$\underset{k\to 1}{\mathrm{lim}}{\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}={\widehat{{\varvec{\beta}}}}_{SWTSLS}=\lambda {\widehat{{\varvec{\beta}}}}_{L}+\left(1-\lambda \right){\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}$, the SWTSLS estimator.

Based on Hausman’s statistic, a pretest estimator (PT) is defined as follows:

$${\widehat{{\varvec{\beta}}}}_{PT}=I{\left({H}_{n}\right)}_{\left[0,{c}_{\alpha }\right.)}{\widehat{{\varvec{\beta}}}}_{L}+I{\left({H}_{n}\right)}_{\left[{c}_{\alpha },\infty \right.)}{\widehat{{\varvec{\beta}}}}_{LIML},$$

where ${c}_{\alpha }$ is the nominal level of the chi-square distribution $\left({\chi }_{E}^{2}\right)$ with $E$ degrees of freedom and $I{\left({H}_{n}\right)}_{\left[0,{c}_{\alpha }\right.)}$ is an indicator function, which takes a value of 1 when the test statistic value falls within the interval $\left[a,b\right.)$ and 0 otherwise. This estimator aims to replicate the standard procedure used by researchers to select between models, which involves relying on statistical tests.

The proposed estimator is a flexible Stein k-class IV estimator that can be expanded to incorporate other IV estimators, such as the generalized method of moments and modified LIML (Fuller 1977). Furthermore, incorporating a modification to account for heteroskedasticity and/or serial correlation would be advantageous. The implementation of these enhancements would hold significant value for future research.

4 Asymptotic properties of the SWLIML estimator

In this section, the asymptotic distribution and risk of the SWLIML estimator under a local asymptotic framework are derived by following Wang et al. (2016). To establish the asymptotic distribution of ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$, consider the reduced form Eq. (2.3) for the endogenous variables ${{\varvec{x}}}_{2i}$. The instrumental variable assumption requires that ${\mathbb{E}}\left[{{\varvec{z}}}_{i}{{\varvec{v}}}_{i}^{T}\right]=0$. Therefore, we write the linear projection of the structural equation error ${\mu }_{i}$ onto ${{\varvec{v}}}_{i}$

$${\mu }_{i}={{\varvec{v}}}_{i}^{T}{\varvec{\rho}}+{\xi }_{i}$$

(4.1)

$${\mathbb{E}}\left[{{\varvec{v}}}_{i}{\xi }_{i}\right]=\varvec{0},$$

(4.1) and (2.3) are written without loss of generality, as these equations may be characterized by projection. Particularly, it is assumed that ${\varvec{\rho}}$ in (4.1) is local to zero^{Footnote 4}

$${\varvec{\rho}}=\frac{1}{\sqrt{n}}{\varvec{\eta}},$$

(4.2)

where ${\varvec{\eta}}$ is the $E\times 1$ localizing parameter for the degree of correlation between ${\mu }_{i}$ and ${{\varvec{v}}}_{i}$. Thus, ${{\varvec{x}}}_{2i}$ is exogenous when ${\varvec{\eta}}=\varvec{0}$ and endogenous when ${\varvec{\eta}}\ne \varvec{0} $. $\frac{1}{\sqrt{n}}$ scaling is used in (4.2) since it gives the stability necessary to derive a joint distribution approximation of OLS and LIML. It is helpful to begin by stating the regularity conditions for the asymptotic distribution of ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$. Set ${\mathbb{E}}\left[{{\varvec{\nu}}}_{i}{{\varvec{\nu}}}_{i}^{T}\right]={{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}$, ${\mathbf{Q}}_{\mathbf{z}\mathbf{z}}={\mathbb{E}}\left[{\mathbf{z}}_{i}{\mathbf{z}}_{i}^{T}\right],$ ${\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}={\mathbb{E}}\left[{\mathbf{z}}_{1i}{\mathbf{z}}_{1i}^{T}\right],$ ${\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}={\mathbb{E}}\left[{{\varvec{x}}}_{2i}{{\varvec{x}}}_{2i}^{T}\right]$, ${\mathbf{Q}}_{{\mathbf{x}}_{2}\mathbf{z}}={\mathbb{E}}\left[{{\varvec{x}}}_{2i}{\mathbf{z}}_{i}^{T}\right]$, ${\mathbf{Q}}_{\mathbf{z}{\mathbf{x}}_{2}}={\mathbb{E}}\left[{{\varvec{z}}}_{i}{{\varvec{x}}}_{2i}^{T}\right]$, ${\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}={\mathbb{E}}\left[{{\varvec{x}}}_{2i}{\mathbf{z}}_{1i}^{T}\right]$, ${\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}={\mathbb{E}}\left[{{\varvec{z}}}_{1i}{{\varvec{x}}}_{2i}^{T}\right]$, and ${\sigma }^{2}={\mathbb{E}}\left[{\xi }_{i}^{2}\right]$ to simplify the asymptotic properties of the variance–covariance matrix. Following Huang et al. (2020), we illustrate the regularity conditions.

Assumption 1

1.
$\left({y}_{i},{{\varvec{x}}}_{i},{{\varvec{z}}}_{i},{\mu }_{i},{{\varvec{\nu}}}_{i}: i=1,...,n\right)$ are independently and identically distributed.
2.
${\mathbb{E}}{\Vert {\xi }_{i}\Vert }^{4}<\infty $, ${\mathbb{E}}{\Vert {{\varvec{\nu}}}_{i}\Vert }^{4}<\infty $ and ${\mathbb{E}}{\Vert {{\varvec{z}}}_{i}\Vert }^{4}<\infty $.
3.
$\mathrm{rank}\left[{{\varvec{\Pi}}}_{2}\right]=E$, ${{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}$ and ${\mathbf{Q}}_{\mathbf{z}\mathbf{z}}$ are positive definite.
4.
$\left({\xi }_{i}^{2}|{{\varvec{z}}}_{i},{{\varvec{x}}}_{2i}\right)={\sigma }^{2}$; ${\sigma }^{2}{\left(\mathrm{plim}\frac{{{\varvec{X}}}_{2}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}}{n}\right)}^{-1}={{\varvec{V}}}_{{\varvec{\beta}}}^{L}$ and ${\sigma }^{2}{\left(\mathrm{plim}\frac{{{\varvec{X}}}_{2}^{T}\left({{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){{\varvec{X}}}_{2}}{n}\right)}^{-1}={{\varvec{V}}}_{{\varvec{\beta}}}^{LIML}$, where plim represents the probability limit operator as $n\to \infty $.

Assumption 1.1 is a basic assumption in the literature that specifies that the observations are i.i.d. Assumption 1.2 states that the error terms and instruments have finite fourth moments. This is used to draw on the central limit theorem (CLT). Assumption 1.3 requires that the usual identification assumption is satisfied $\left(\mathrm{e}.\mathrm{g}.,\mathrm{ rank}\left[{{\varvec{\Pi}}}_{2}\right]=E\right)$ so that the coefficient on the endogenous variable is identified, and the full rank condition on ${{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}$ is equivalent to supposing that ${{\varvec{x}}}_{2i}$ and ${{\varvec{z}}}_{i}$ have no common elements. In assumption 1.4, we first make a conditional homoscedasticity assumption on the error ${\xi }_{i}$ given the instruments and regressors, and second, assume that the asymptotic variances of the OLS and LIML estimators simplify under a conditional homoscedasticity condition.

Theorem 4.1

When assumptions 1.1–1.4 and Eq. (4.1) hold, the following convergence results hold jointly:

(i)
${n}^{1/2}\left(\begin{array}{c}{\widehat{{\varvec{\beta}}}}_{L}-{\varvec{\beta}}\\ {\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}\end{array}\right)\stackrel{d}{\to }{\varvec{\psi}}+\zeta $, where ${\varvec{\psi}}=\left(\begin{array}{c}\frac{1}{{\sigma }^{2}}{{\varvec{C}}}_{0}{\boldsymbol{\Sigma }}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}\\ 0\end{array}\right)$ and $\zeta \sim N\left(0,{\varvec{C}}\right)$ with ${\varvec{C}}=\left[\begin{array}{cc}{{\varvec{C}}}_{0}& {{\varvec{C}}}_{0}\\ {{\varvec{C}}}_{0}& {{\varvec{C}}}_{1}\end{array}\right]$, ${{\varvec{C}}}_{0}={\sigma }^{2}{\left[{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right]}^{-1}$ and, ${{\varvec{C}}}_{1}={\sigma }^{2}{\left[{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}\left({{\varvec{I}}}_{n}-\widehat{\kappa }\right)+\widehat{\kappa }\left({\mathbf{Q}}_{{\mathbf{x}}_{2}\mathrm{z}}{{\varvec{Q}}}_{\mathbf{z}\mathbf{z}}^{-1}{\mathbf{Q}}_{\mathbf{z}{\mathbf{x}}_{2}}\right)-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right]}^{-1}$;
(ii)
${H}_{n}\stackrel{d}{\to }\varphi ={\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)$, where ${\varvec{\Omega}}={\mathcal{A}\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}{\mathcal{A}}^{T}$ with $\mathcal{A}={\left(\begin{array}{cc}-{{\varvec{I}}}_{n}& {{\varvec{I}}}_{n}\end{array}\right)}^{T}$ and ${\mathcal{A}}_{2}={\left(\begin{array}{cc} \varvec{0} & {{\varvec{I}}}_{n}\end{array}\right)}^{T}$;
(iii)
${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}-{\varvec{\beta}}\right)\stackrel{d}{\to }\Phi ={\mathcal{A}}_{2}^{T}\zeta -min\left(\frac{\mathcalligra{q}}{\varphi },1\right){\mathcal{A}}^{T}\left({\varvec{\psi}}+\zeta \right).$

Proof

The proof of Theorem 4.1 (i) is straightforward and standard under the homoscedasticity assumption (Wang et al. 2016). Part (ii) holds once the estimators for the covariance matrices are consistent. Part (iii) follows from the continuous mapping theorem. Let ${{\varvec{\psi}}}_{I}={\mathcal{A}}_{1}{\varvec{\psi}}$, ${{\varvec{\psi}}}_{II}={\mathcal{A}}_{2}{\varvec{\psi}}$, ${\zeta }_{I}={\mathcal{A}}_{1}\zeta $ and ${\zeta }_{II}={\mathcal{A}}_{2}\zeta .$ The proof is simple under (4.2) and assuming homoscedasticity. First, we outline the convergence of the OLS estimator to show that

$${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{L}-{\varvec{\beta}}\right)={\sigma }^{-2}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+{n}^{-1/2}{\left({\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{x}}_{2}}-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right)}^{-1}{\mathbb{E}}\left({{\varvec{x}}}_{2i}\left({{\varvec{I}}}_{n}-{{\varvec{P}}}_{{{\varvec{z}}}_{1}}\right)\right){\xi }_{i}\stackrel{d}{\to }{{\varvec{\psi}}}_{I}+{\zeta }_{I}$$

with ${\zeta }_{I}\sim {\left(\mathrm{plim}\frac{1}{n}{{\varvec{X}}}_{2}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}\right)}^{-1}{\varvec{T}},$ and

$${\varvec{T}}=\frac{1}{\sqrt{n}}{{\varvec{X}}}_{2}^{T}\left({{\varvec{I}}}_{n}-{{\varvec{P}}}_{{{\varvec{Z}}}_{1}}\right){\varvec{\mu}}\sim N\left(\varvec{0},{{\sigma }^{2}\left(\mathrm{plim}\frac{1}{n}{{\varvec{X}}}_{2}^{T}{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}{{\varvec{X}}}_{2}\right)}^{-1}\right)$$

Hence, ${\zeta }_{I}\sim N\left(\varvec{0}, {\varvec{C}_0}\right)$. Second, to show the convergence of the LIML:

$${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}={\left({{\varvec{X}}}_{2}^{T}\left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]{{\varvec{X}}}_{2}\right)}^{-1}{{\varvec{X}}}_{2}^{T}\left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]\left({{\varvec{v}}}^{T}{\varvec{\rho}}+\xi \right)={\left(\frac{1}{n}{{\varvec{X}}}_{2}^{T}\left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]{{\varvec{X}}}_{2}\right)}^{-1} \left({\frac{1}{n}{{\varvec{X}}}_{2}^{T} \left[{{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right]\xi} \right)$$

$${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}\right)\stackrel{d}{\to }{\zeta }_{II}\sim N\left(\varvec{0},{\sigma }^{2}{\left(\mathrm{plim}\frac{1}{n}{{\varvec{X}}}_{2}^{T}\left({{\varvec{M}}}_{{{\varvec{Z}}}_{1}}-\widehat{\kappa }{{\varvec{M}}}_{{\varvec{Z}}}\right){{\varvec{X}}}_{2}\right)}^{-1}\right).$$

The convergence in Theorem 4.1 (ii) follows directly from the consistency of the variance–covariance estimates, and (iii) follows from the continuous mapping theorem as in Theorem 1 of Hansen (2017). $\hfill\square $

Theorem 4.1 describes the joint asymptotic distribution of ${\widehat{{\varvec{\beta}}}}_{L}$ and ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$, which is normally distributed, and of ${H}_{n}$ and ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$ under the local exogeneity assumption. When ${\varvec{\eta}}\ne \varvec{0}$, ${\widehat{{\varvec{\beta}}}}_{L}$ is asymptotically biased, and ${H}_{n}$ has a noncentral chi-square distribution with noncentrality parameter ${\varvec{\psi}}$. Finally, the asymptotic distribution of the SWLIML estimator converges to a nonlinear function of the normal random vector in Theorem 4.1 (i), which is a function of ${\varvec{\psi}}$.

Let ${\mathcal{B}}_{n}$ be any estimator of ${\varvec{\beta}}$ and let $\mathcal{W}$ be a weight matrix to calculate the risk of different estimators for a simultaneous linear model in the presence of endogeneity. The notion of asymptotic risk is described as.

$${\mathbb{R}}\left({\mathcal{B}}_{n}\right)=\underset{\mathfrak{s}\to \infty }{\mathrm{lim}}\underset{n\to \infty }{\mathrm{lim inf}}{\mathbb{E}}\left[\mathrm{min}\left\{n{\Vert {\mathcal{B}}_{n}-{\varvec{\beta}}\Vert }_{\mathcal{W}}^{2},\mathfrak{s}\right\}\right],$$

where $\mathfrak{s}$ denotes the truncated parameter, which controls the expected scaled loss function. If any estimator has an asymptotic distribution

$${n}^{1/2}\left({\mathcal{B}}_{n}-{\varvec{\beta}}\right)\stackrel{d}{\to }\Psi $$

for some random variable $\Psi $, then the definition of asymptotic risk is useful, as it is simple to analyze. Therefore, by following Lehmann and Casella (1998), the asymptotic risk is computed as follows:

$${\mathbb{R}}\left({\mathcal{B}}_{n}\right)={\mathbb{E}}\left[{\Psi }^{T}\mathcal{W}\Psi \right]=\mathrm{tr}\left[\mathcal{W}{\mathbb{E}}\left(\Psi {\Psi }^{T}\right)\right]$$

(4.3)

To make comparisons of different estimators easier, we set $\mathcal{W}={\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}$ since it simplifies the analysis. Lemma 1 is stated and used in the proof of Theorem 4.2 (Stein, 1981).

Lemma 1

If $\zeta \sim N\left(0,{\varvec{C}}\right)$ is $E\times 1$, ${\mathbb{K}}$ is $E\times E$, and $\mathrm{\rm H}\left(\boldsymbol{\varkappa }\right):{\mathbb{R}}^{E}\to {\mathbb{R}}^{E}$ is continuous, then

$${\mathbb{E}}\left[\mathrm{\rm H}{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}\zeta \right]={\mathbb{E}}tr\left[\frac{\partial }{\partial \boldsymbol{\varkappa }}{\mathrm{\rm H}\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}\right].$$

Proof

Let ${\phi }_{{\varvec{C}}}\left(\boldsymbol{\varkappa }\right)$ denote the normal density function with mean 0 and variance–covariance matrix ${\varvec{C}}$.

By multivariate integration by parts,

$${\mathbb{E}}\left[H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}\zeta \right]=\int H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}{{\varvec{C}}}^{-1}\boldsymbol{\varkappa }{\phi }_{{\varvec{C}}}\left(\boldsymbol{\varkappa }\right)\left(\mathbf{d}\boldsymbol{\varkappa }\right)=\int \mathrm{tr}\left[\frac{\partial }{\partial \boldsymbol{\varkappa }}H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}\right]{\phi }_{{\varvec{C}}}\left(\boldsymbol{\varkappa }\right)\left(\mathbf{d}\boldsymbol{\varkappa }\right)={\mathbb{E}}\mathrm{tr}\left[\frac{\partial }{\partial \boldsymbol{\varkappa }}H{\left(\zeta +{\varvec{\psi}}\right)}^{T}{\mathbb{K}}{\varvec{C}}\right] \hfill\square $$

Theorem 4.2

Under assumptions 1.1–1.4 and Eq. (4.1), if $\mathrm{E}>2$ and $0<\mathcalligra{q}\le 2\left(\mathrm{E}-2\right)$, then

(i)
${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{LIML}\right)=tr\left[\mathcal{W}{{\varvec{C}}}_{1}\right]$ and ${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWLIML}^{*}\right)=tr\left[\mathcal{W}{\mathbb{E}}\left(\Psi {\Psi }^{T}\right)\right]$, where $\mathcal{W}={\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}$;
(ii)
${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWLIML}^{*}\right)<{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{LIML}\right)-\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{\boldsymbol{\Sigma }}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{\boldsymbol{\Sigma }}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad (4.4)$

Proof

We know that ${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}-{\varvec{\beta}}\right)\stackrel{d}{\to }{\mathcal{A}}_{2}^{T}\zeta \sim N\left(0,{{\varvec{C}}}_{1}\right)$ from Theorem 4.1. Therefore, the asymptotic risk of ${\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}$ can be written as

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)={\mathbb{E}}\left[{\zeta }^{T}{\mathcal{A}}_{2}^{T}\mathcal{W}{\mathcal{A}}_{2}\zeta \right]=\mathrm{tr}\left[\mathcal{W}{\mathbf{C}}_{1}\right]$$

Now

$${n}^{1/2}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}-{\varvec{\beta}}\right)\stackrel{d}{\to }\Psi ,$$

where $\Psi $ indicates the limiting random variable. The equivalent random variable $\left({\Psi }^{*}\right)$ is illustrated without trimming its positive part.

$${\Psi }^{*}={\mathcal{A}}_{2}^{T}\zeta -\left(\frac{\mathcalligra{q}}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right){\mathcal{A}}^{T}\left({\varvec{\psi}}+\zeta \right).$$

Therefore, it can take advantage of the statement that the pointwise quadric risk of $\Psi $ is strictly smaller than that of ${\Psi }^{*}$

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)={\mathbb{E}}\left({\Psi }^{T}\mathcal{W}\Psi \right)<{\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right).$$

The right-hand side of the inequality is calculated by using the expression for the random variable ${\Psi }^{*}$, as shown below:

$${\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right)={\mathbb{E}}\left[{\zeta }^{T}{\mathcal{A}}_{2}^{T}\mathcal{W} {\mathcal{A}}_{2}\zeta \right]+{\mathcalligra{q}}^{2}{\mathbb{E}}\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\mathcal{A}}_{2}^{T}\mathcal{W} {\mathcal{A}}_{2}\left({\varvec{\psi}}+\zeta \right)}{{\left({\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right)}^{2}}\right]-2\mathcalligra{q}{\mathbb{E}}\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}\zeta }{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right],=\mathrm{tr}\left(\mathcal{W}{{\varvec{C}}}_{1}\right)+{\mathcalligra{q}}^{2}{\mathbb{E}}\left[\frac{1}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]-2\mathcalligra{q}{\mathbb{E}}\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}\zeta }{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right],$$

where

${\mathcal{A}}_{2}^{\mathrm{T}}\mathbf{C}\mathcal{A}={{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}={\mathcal{W}}^{-1}$ and $\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{\mathrm{T}}\mathbf{C}{\varvec{\Omega}}={\mathcal{A}}^{\mathrm{T}}\mathcal{W}{\mathcal{A}}_{2}^{\mathrm{T}}\mathbf{C}\mathcal{A}\mathcal{W}{\mathcal{A}}^{\mathrm{T}}={\varvec{\Omega}}$,

and make use of Lemma 1:

$$\eta \left(\boldsymbol{\varkappa }\right)=\frac{\boldsymbol{\varkappa }}{{\boldsymbol{\varkappa }}^{T}{\varvec{\Omega}}\boldsymbol{\varkappa }} ,$$

$$\frac{\partial }{\partial \boldsymbol{\varkappa }}H{\left(\boldsymbol{\varkappa }\right)}^{T}=\left(\frac{1}{{\boldsymbol{\varkappa }}^{T}{\varvec{\Omega}}\boldsymbol{\varkappa }}\right){\varvec{I}}-\frac{2}{{\left({\boldsymbol{\varkappa }}^{T}{\varvec{\Omega}}\boldsymbol{\varkappa }\right)}^{2}}{\varvec{\Omega}}{\boldsymbol{\varkappa }}^{T}\boldsymbol{\varkappa }.$$

Consequently,

$$\left[\frac{{\left({\varvec{\psi}}+\zeta \right)}^{T}\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}\zeta }{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]={\mathbb{E}}\mathrm{tr}\left[\frac{\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}-\frac{2\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}}{{\left({\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right)}^{2}}{\varvec{\Omega}}{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right],={\mathbb{E}}\mathrm{tr}\left[\frac{tr\left(\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}\right)}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]-2{\mathbb{E}}\mathrm{tr}\left[\frac{\mathcal{A}\mathcal{W}{\mathcal{A}}_{2}^{T}{\varvec{C}}}{{\left({\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right)}^{2}}{\varvec{\Omega}}{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right]={\mathbb{E}}\left[\frac{E-2}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right].$$

Hence, ${\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right)={\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)+{\mathcalligra{q}}^{2}{\mathbb{E}}\left[\frac{1}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right]-2\mathcalligra{q}{\mathbb{E}}\left[\frac{E-2}{{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)}\right].$

Then, making use of Jensen’s inequality and applying the assumption that $\mathcalligra{q}\le 2\left(E-2\right),$

$${\mathbb{E}}\left({{\Psi }^{*}}^{T}\mathcal{W}{\Psi }^{*}\right)\le {\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)-\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{\mathbb{E}}\left[{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right]}$$

(4.5)

As $\mathrm{tr}\left({\varvec{C}}{\varvec{\Omega}}\right)=\mathrm{tr}\left(\mathcal{W}{\mathcal{A}}^{\mathrm{T}}\mathbf{C}\mathcal{A}\right)=E$, we have

$${\mathbb{E}}\left[{\left({\varvec{\psi}}+\zeta \right)}^{T}{\varvec{\Omega}}\left({\varvec{\psi}}+\zeta \right)\right]={{\varvec{\psi}}}^{T}{\varvec{\Omega}}{\varvec{\psi}}+\mathrm{tr}\left({\varvec{C}}{\varvec{\Omega}}\right)={{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+\mathrm{tr}\left(\mathcal{W}{\mathcal{A}}^{\mathrm{T}}\mathbf{C}\mathcal{A}\right).$$

Now, (4.5) can be simplified as follows:

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)\le {\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)-\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left[{{\varvec{C}}}_{1}-{{\varvec{C}}}_{0}\right]}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}\quad\quad\quad\quad \hfill\square $$

Theorem 4.2 holds under the mild regularity conditions in assumptions 1.1–1.4 and Eq. (4.1), and Theorem 4.2 (ii) indicates that the asymptotic risk of ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$ is strictly less than the asymptotic risk of ${\widehat{{\varvec{\beta}}}}_{LIML}$ when the number of endogenous variables exceeds 2. As the inequality in Theorem 4.2 (ii) is stringent and is satisfied for all values of ${\varvec{\eta}}$, the proposed SWLIML estimator strictly dominates the classical LIML estimator when $E>2$. The assumption that the number of endogenous variables exceeds 2 is comparable to Stein’s (1956) and Srivastava and Bilodeau’s (1989) estimators, as a shrinkage dimension greater than two is required for shrinkage to lead to global reductions in the risk function. Furthermore, it can be observed that the ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}$ estimator substantially decreases the risk function in situations where the degree of endogeneity is low, the number of endogenous variables $\left({{\varvec{x}}}_{2i}\right)$ is large, and the instruments are weak. The expression $\left(2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}\right)/\left({{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E\right)$ is minimized when the shrinkage parameter is estimated as $\widehat{\mathcalligra{q}}=E-2$, where $E$ denotes the number of endogenous variables, and thus the inequality $0<\mathcalligra{q}\le 2\left(E-2\right)$ is satisfied when $E>2$. There is no compelling method for selecting the value of $\mathcalligra{q}$ when $E<3$; therefore, the shrinkage estimator is proposed by motivating the idea of Toker (2020), Amin et al. (2020) and Asl et al. (2021). We set the right-hand side of the inequality in terms of absolute value, $0<\mathcalligra{q}\le \left|2\left(E-2\right)\right|$, and suggest the shrinkage estimator, $\widehat{\mathcalligra{q}}=1/\left|2-\left(E-2\right)\right|$, to estimate the shrinkage parameter.

4.1 Asymptotic risk reduction

In certain situations, the number of instruments may increase with the sample size but still be smaller than the sample size. In such cases, the LIML estimator is consistent, whereas the TSLS is inconsistent, as demonstrated in previous studies, such as those by Bekker (1994) and Chao and Swanson (2005). To evaluate the extent to which SWLIML and SWTSLS improve the level of risk, it is first necessary to define the risk bounds of both estimators. The risk of SWLIML is defined in Theorem 4.2 (ii), and it can be expressed as

$$\frac{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)}{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)}\le 1 -\frac{2\mathcalligra{q}\left(E-2\right)-{\mathcalligra{q}}^{2}}{{{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}$$

(4.6)

For the SWTSLS estimator, its risk satisfies the bound

$${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWTSLS}\right)\le {\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\right) -\frac{\mathcalligra{q}\left(2\left(E-2\right)-\mathcalligra{q}\right)}{{{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E},$$

(4.7)

where ${\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\right)=\mathrm{tr}\left({\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{2}\right)$ and ${{\varvec{C}}}_{2}={\sigma }^{2}{\left({\mathbf{Q}}_{{\mathbf{x}}_{2}\mathrm{z}}{{\varvec{Q}}}_{\mathbf{z}\mathbf{z}}^{-1}{\mathbf{Q}}_{\mathbf{z}{\mathbf{x}}_{2}}-{\mathbf{Q}}_{{\mathbf{x}}_{2}{\mathbf{z}}_{1}}{{\varvec{Q}}}_{{\mathbf{z}}_{1}{\mathbf{z}}_{1}}^{-1}{\mathbf{Q}}_{{\mathbf{z}}_{1}{\mathbf{x}}_{2}}\right)}^{-1}$, as demonstrated in the study by Hansen (2017). When comparing the bounds of SWLIML (4.4) and SWTSLS (4.7), it is evident that neither estimator uniformly dominates the other. This indicates that there are certain risk magnitudes where the SWLIML estimator has a lower risk, while the SWTSLS estimator has a lower risk in some cases. These bounds can be utilized to identify the regions where one estimator performs better than the other. This information helps in selecting the most appropriate estimator for a particular scenario.

To see the magnitude of risk improvement for SWTSLS and SWLIML, the following parameters should be defined as follows:

(i)
${d}_{E}=\frac{\mathrm{tr}\left({\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{1}\right)}{E}$ and ${h}_{E}=\frac{{{\sigma }^{-4}{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}}{E}$ for SWTSLS
(ii)
${d}_{E}^{*}=\frac{\mathrm{tr}\left(\mathcal{W}{{\varvec{C}}}_{1}\right)}{E}$ and ${h}_{E}^{*}=\frac{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}}{E}$ for SWLIML.

The parameters ${d}_{E}$ and ${d}_{E}^{*}$ are nonlinear functions of the correlation between ${{\varvec{x}}}_{2i}$ and instruments ${{\varvec{z}}}_{2i}$, and both satisfy ${d}_{E}$, ${d}_{E}^{*}\ge 1$. Conversely, the parameters ${h}_{E}$ and ${h}_{E}^{*}$ serve as indicators of the magnitude of endogeneity ${\varvec{\eta}}$. The result of Eq. (4.7) and (4.6) can be done

$$\frac{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{SWTSLS}\right)}{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{TSLS}}\right)}\le 1 -\frac{{\mathcalligra{q}}^{2}}{\mathrm{tr}\left({\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{2}\right)\left({\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}{\left({{\varvec{C}}}_{2}-{{\varvec{C}}}_{0}\right)}^{-1}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E\right)}\simeq 1 -\frac{1}{{d}_{E}\left({h}_{E}+1\right)}$$

(4.8)

and

$$\frac{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}^{*}\right)}{{\mathbb{R}}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{LIML}}\right)}\le 1 -\frac{\mathcalligra{q}}{\mathrm{tr}\left(\mathcal{W}{{\varvec{C}}}_{1}\right)}\frac{\mathcalligra{q}}{{\sigma }^{-4}{{\varvec{\eta}}}^{T}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{{\varvec{C}}}_{0}\mathcal{W}{{\varvec{C}}}_{0}{{\varvec{\Sigma}}}_{{\varvec{\nu}}{\varvec{\nu}}}{\varvec{\eta}}+E}\simeq 1 -\frac{1}{{d}_{E}^{*}\left({h}_{E}^{*}+1\right)}$$

(4.9)

respectively, with approaches equality as the value of $E$ increases.

Equation (4.8) illustrates that SWTSLS achieves a percentage reduction in asymptotic risk compared to TSLS, which is approximately $100/\left({d}_{E}\left({h}_{E}+1\right)\right)$, while (4.9) shows that SWLIML achieves an approximate percentage reduction in risk of approximately $100/\left({d}_{E}^{*}\left({h}_{E}^{*}+1\right)\right)$ compared to LIML. It is observed that the percentage reduction in asymptotic risk due to the Stein k-class IV estimator is highest when the values of the parameters ${d}_{E},{h}_{E},{d}_{E}^{*}$ and ${h}_{E}^{*}$ are small. The strength and size of the instruments determine the parameters ${d}_{E}$ and ${d}_{E}^{*}$, while the magnitude of endogeneity determines the increasing values of ${h}_{E}$ and ${h}_{E}^{*}$. In cases where there are too many weak instruments, the parameter ${d}_{E}$ is inflated, while ${d}_{E}^{*}$ approaches one. Therefore, the SWLIML estimator is expected to have a smaller asymptotic risk than SWTSLS, or the bound (4.4) is expected to be smaller than the bound (4.7). In certain situations where the instruments are extremely weak and the endogeneity is mild to moderate (${h}_{E}$ is small), the bound (4.7) can be smaller than (4.4). When there are many strong instruments and a high level of endogeneity, the bound (4.4) can be much smaller than (4.7) since the parameter ${d}_{E}$ is inflated, while ${d}_{E}^{*}\to 1$. However, when the degree of endogeneity is mild, the difference between (4.7) and (4.4) is marginal.

The theory suggests that depending on the specific scenario, either the SWLIML estimator or the SWTSLS estimator may outperform the other. Specifically, when the number of instruments is high and the degree of endogeneity is moderate to high, the SWLIML estimator is expected to have lower asymptotic risk than the SWTSLS estimator. On the other hand, when there are not too many instruments and the degree of endogeneity is small, the SWTSLS estimator is expected to have lower risk. Therefore, we anticipate that the Stein k-class IV estimator will result in significant risk reductions, especially when the instruments are weak, the degree of endogeneity is mild to high, and/or the number of instruments is large. To gain a better understanding, we examine simulation estimates in the following section to determine which estimator performs better than the other.

5 Monte Carlo design

The simulation is conducted by following Hahn and Inoue (2002), Kuersteiner and Okui (2010) and Hansen (2017). The observations $\left({y}_{i},{{\varvec{x}}}_{i},{{\varvec{z}}}_{i},{\mu }_{i},{{\varvec{\nu}}}_{i}; i=1,...,n\right)$ are produced by considering a simultaneous equations model with endogenous variables:

$${y}_{i} ={{\varvec{x}}}_{i}^{T}{\varvec{\beta}}+\alpha + {\mu }_{i},$$

$${{\varvec{x}}}_{i}={{\varvec{\Pi}}}^{T}{{\varvec{z}}}_{i} +\gamma + {{\varvec{v}}}_{i},$$

where ${{\varvec{z}}}_{i}\stackrel{i.i.d}{\sim } N\left(0,{{\varvec{I}}}_{K}\right)$ are the instruments, $\left(\begin{array}{c}{\mu }_{i}\\ {{\varvec{v}}}_{i}\end{array}\right)\stackrel{i.i.d}{\sim }N\left(0,{\varvec{\Omega}}\right)$ with ${\varvec{\Omega}}=\left[\begin{array}{cccc}1& {\rho }^{*}& \dots & {\rho }^{*}\\ {\rho }^{*}& 1& 0& 0\\ \vdots & 0& \ddots & 0\\ {\rho }^{*}& 0& 0& 1\end{array}\right],$ ${\rho }^{*}=\rho /\sqrt{E}$ is the correlation between ${\mu }_{i}$ and ${{\varvec{v}}}_{i}$, which measures the degree of endogeneity, and $\rho =0$ reveals that there is no endogeneity issue. The determinant of ${\varvec{\Omega}}$ is ${1-\rho }^{2}$, which is positive iff $\left|\rho \right|<1$. Therefore, we symbolize the correlation $\left[\rho /\sqrt{E}; \rho \in \left(-1, 1\right)\right]$, which is a natural parameterization. The integers $E$ and $K$ are the total number of endogenous variables and instruments, respectively. The parameter of interest is the parameter vector ${\varvec{\beta}}$, which is set to ${\left(-1, 1\right)}^{T}$, ${\left(-1, 1,-\mathrm{1,1}\right)}^{T}$ and ${\left(-1, 1,-\mathrm{1,1},-\mathrm{1,1}\right)}^{T}$ when $E=2$, $E=4$, and $E=6$, respectively, and parameters $\alpha $ and $\gamma $ are set to zero. The reduced form matrix ${\varvec{\Pi}}$ is set according to the concentration parameter (CP), which is a unitless measure of the strength of the instruments (Stock et al. 2002). The CP can be specified for multiple endogenous variables as ${{\varvec{\Sigma}}}_{{\varvec{v}}{\varvec{v}}}^{-1/2}{{\varvec{\Pi}}}^{T}{\mathbf{Z}}^{T}\mathbf{Z}{\varvec{\Pi}}{{\varvec{\Sigma}}}_{{\varvec{v}}{\varvec{v}}}^{-1/2}.$ According to Stock et al. (2002), the CP plays an important role since it determines the strength of the instruments, and there is also a one-to-one correspondence between the ${R}^{2}$ value and CP. For ${\varvec{\Pi}}$, a $K\times E$ matrix is defined, where the first $E$ rows and $E$ columns of Π are diagonal with , while all other elements of the matrix are equal to zero and is a scalar, set to $={\left(\frac{{R}^{2}}{1-{R}^{2}}\right)}^{1/2}$ so that ${R}^{2}$ is the reduced form population R-squared for each ${x}_{ji}$. In the design of the experiment,^{Footnote 5} the values of $\left({R}^{2},n, \rho ,K, E\right)$ are selected from the specifications below:

$${R}^{2}\in \left\{0.01, 0.10, 0.30\right\}; n\in \left\{100, 500, 1000, 2000\right\}; \rho \in \left\{0, 0.25, 0.50, 0.75, 0.90\right\}; K\in \left\{10, 30, 60\right\};E\in \left\{2, 4, 6\right\}$$

The parameter ${R}^{2}$ signifies the strength of the instruments, and ${R}^{2}=0.01$ indicates weak instruments (Donald and Newey, 2001). n represents the sample size, $\rho $ controls the degree of endogeneity (the regressor is exogenous when $\rho =0$ and a large value of $\rho $ indicates severe endogeneity). Note that the dimension of ${{\varvec{z}}}_{i}$ is greater than that of ${{\varvec{x}}}_{i}$ $\left(K>E\right)$, so the model is overidentified. LIML is equivalent to TSLS when the model is identified, and therefore, the overidentified model is used to examine the performance of the Stein k-class IV estimator. OLS, TSLS, LIML, CE (Sawa, 1973), SWTSLS, SWLIML and pretest estimators are examined for each of the simulated results, which are based on 5000 replications. The performance of each estimator is compared by examining the median squared error, i.e., for any estimator $\widehat{{\varvec{\beta}}}$,

$$M\left(\widehat{{\varvec{\beta}}}\right)=median\left[{\left(\widehat{{\varvec{\beta}}}-{\varvec{\beta}}\right)}^{T}\left(\widehat{{\varvec{\beta}}}-{\varvec{\beta}}\right)\right]$$

As in Donald et al. (2009), we report the robust measures of the median squared error^{Footnote 6} instead of the mean squared error for each estimator. This is because TSLS and LIML estimators may not have finite variances in finite samples. Additionally, the use of the median squared error is common in the simultaneous model literature.

5.1 Monte Carlo results

To simplify the presentation, the simulated relative efficiencies (SREs) are reported as a measure of performance. The SREs are defined as the ratio of the simulated median squared error of any of the estimators relative to the simulated median squared error of ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}$. The SRE of any estimator ${\widehat{{\varvec{\beta}}}}^{*}$ is characterized as

$$\mathrm{SRE}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}},{\widehat{{\varvec{\beta}}}}^{*}\right)=\frac{median\left[{\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}-{\varvec{\beta}}\right)}^{T}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}-{\varvec{\beta}}\right)\right]}{median\left[{\left({\widehat{{\varvec{\beta}}}}^{*}-{\varvec{\beta}}\right)}^{T}\left({\widehat{{\varvec{\beta}}}}^{*}-{\varvec{\beta}}\right)\right]},$$

where ${\widehat{{\varvec{\beta}}}}^{*}$ is the ${\widehat{{\varvec{\beta}}}}_{L}$, ${\widehat{{\varvec{\beta}}}}_{TSLS}$, ${\widehat{{\varvec{\beta}}}}_{LIML}$, ${\widehat{{\varvec{\beta}}}}_{CE}$, ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWTSLS}}^{*}$, or ${\widehat{{\varvec{\beta}}}}_{PT}$ estimator. Note that when $\mathrm{SRE}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}},{\widehat{{\varvec{\beta}}}}^{*}\right)<1$, it specifies that the performance of ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}$ is improved relative to other estimators. On the other hand, if $\mathrm{SRE}\left({\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}},{\widehat{{\varvec{\beta}}}}^{*}\right)>1$, it suggests poor performance of ${\widehat{{\varvec{\beta}}}}_{\mathrm{SWLIML}}$. The SREs of the estimators are presented in Tables 1, 2, 3, 4. Table 1 corresponds to the scenario where $E=2$ and $K=30$, Table 2 pertains to the situation where $E=2$ and $K=60$, Table 3 is the case when $E=4$ and $K=30$, and Table 4 shows the SREs for the case where $E=4$ and $K=60$. In the supplementary materials, Tables I–IV present the SREs of the estimators for various cases, including the situations where many variables are considered to address different empirically relevant problems. The main findings of the simulation study are summarized below:

Table 1 ^aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for $E=2,$ $K=30,$ and $\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90$

Full size table

Table 2 ^aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for $E=2,$ $K=60,$ and $\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90$

Full size table

Table 3 ^aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for $E=4,$ $K=30,$ and $\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90$

Full size table

Table 4 ^aSimulated relative efficiencies (SREs) of the median squared errors of OLS, TSLS, LIML, CE and SWTSLS relative to SWLIML for $E=4,$ $K=60,$ and $\rho =0, 0.25, 0.50, 0.75\,\mathrm{and}\,0.90$

Full size table

Monte Carlo simulation results indicate that the performance of the proposed Stein k-class IV estimator is satisfactory relative to the CE, OLS, TSLS, and LIML estimators in the presence of moderate to strong endogeneity.
The median squared error of the estimators substantially decreases by increasing the sample size, but as the number of endogenous variables increases, the median squared error also increases. The Stein k-class IV estimator performs best in the case of weak instruments and endogeneity, while OLS, TSLS, LIML and CE perform worst for all parameter values when the degree of endogeneity is high.
In the case of no endogeneity $\left(\rho =0\right)$, the OLS estimator has a lower median squared error than the other estimators, but this ranking is reversed in the presence of endogeneity. Considering a case of weak instruments $\left({R}^{2}=0.01\right)$ and a lower degree of endogeneity, $\left(\rho =0.25\right)$, the OLS estimator has the lowest dispersion compared to the estimators, particularly for small sample sizes, and these results are similar to those of Hansen (2017).
Looking across the tables where $E=2$, it is clear that SWLIML dominates for moderate to large values of $\rho $, and its performance is greatly enhanced when there are many instruments.
In most cases, OLS, TSLS, LIML, CE, and SWTSLS have a lower SRE than SWLIML, and therefore, the performance of the proposed estimator is more robust to many weak instruments and intermediate to large values of $\rho $. The SREs of the TSLS and SWTSLS are greater than one for ${R}^{2}=0.01$, $n=100$ and $E=4$ in a few cases.
In the supplementary materials, Table I shows that SWLIML has a lower median squared error than the other estimators when $\rho $ is large, while SWTSLS, TSLS and CE have better performance for small $\rho $. The SWTSLS exhibits a lower median squared error than TSLS for small values of $\rho $, but the opposite is true for large $\rho $, as shown in Table II.
When $K$ is small and $E=6$, in most cases, SWTSLS performs marginally better than SWLIML but significantly outperforms other estimators, except for TSLS in the case of large values of $\rho $. Conversely, for a large value of $K$, SWLIML exhibits a significantly superior performance to other existing estimators, as demonstrated by Tables IV and V.

Figure 1 presents the relative median squared error of the estimators for $\rho $, which varies from 0 to 1, $n=500$ and ${R}^{2}=0.10$, where ${\alpha }^{*}=0.05, 0.15, 0.25$ represents the minimum and maximum assured efficiency of the Pretest estimator (Kibria and Saleh, 2006). In addition, Figs. a−h are provided in the supplementary materials to depict real data analysis situations with more covariates in the model.

The median squared error of the pretest estimators and OLS is smaller than that of the other estimators when the values of $\rho $ are small. The performance of the pretest is generally similar to that of OLS for low $\rho $, and in many cases, it is similar to that of LIML when $\rho $ is high. However, the SRE of Pretest estimators is naturally lower than LIML for intermediate values of $\rho $.
When ${R}^{2}=0.10$ and $\rho $ take moderate values, the SRE of SWTSLS and TSLS is higher than that of SWLIML, but for large ${R}^{2}$, the SRE is much lower than that of SWLIML.
The improvements from the SWLIML estimator are most significant under large values of $\rho $ and many instruments.

Overall, based on the Monte Carlo simulation evidence, the improvements made by the SWLIML estimator relative to the SWTSLS, CE, TSLS, LIML, and OLS estimators are greatest in the presence of many instruments and moderate to large values of $\rho $. The SWTSLS method performs better when the number of instruments and degree of endogeneity are low and the number of endogenous variables is high. However, as the level of endogeneity increases, TSLS may have a higher SRE due to overidentification, reducing the benefits of shrinkage. Sawa (1973) proposed a CE method where the weight depends on the number of exogenous variables, and this method has a significantly lower SRE when the number of instruments is high. Additionally, when the model is just identified, the performance of CE is almost identical to that of TSLS. When there are more than two endogenous variables, the inequality in Theorem 4.2 is satisfied, and SWLIML uniformly dominates LIML. When there are fewer than three endogenous variables, SWLIML has either a smaller SRE than LIML or a slightly greater SRE. The promising results from the simulation suggest that SWLIML should be utilized in situations where there is a need to estimate a linear IV model with many instruments.

6 Application: green patents

The empirical relevance of the proposed method is also demonstrated by analyzing the green patent^{Footnote 7} dataset from Maasoumi et al. (2021). The data cover the period 1990–2016, and the total sample includes 140 observations from Nordic countries (Denmark, Finland, Iceland, Norway and Sweden). The primary purpose for this dataset is to facilitate investigations of the impact of regulation and strategies on green patent production and the advancement of renewable energy technologies in Nordic countries. In addition, the influence of environmental stringency, environmental taxes and policies related to renewable energy patents are considered. More examples of innovative energy strategies and different public environmental policies can be found in Popp (2002), Faber and Hesen (2004), Taylor (2016), and others. Heshmati et al. (2015) and Schiederig et al. (2012) discussed the growth of renewable energy and green innovation, respectively, in detail. The literature on renewable energy and the factors influencing the use of technology to achieve the United Nations Sustainable Development Goals is growing. In a recent study, Maasoumi et al. (2021) examined the impact of regulations and policies on green patent generation and the evolution of renewable energy technologies in the Organization for Economic Co-operation and Development (OECD) countries. Therefore, analyzing these conflicts of interest is highly relevant, which is why this specific dataset is especially useful.

Green research and development investment (GRDI), research and development personnel (RDP) and government expenditure on tertiary education (GETE) might be endogenous variables; therefore, both the OLS and the k-class estimators are examined. A detailed description of the variables can be found in Maasoumi et al. (2021), while a brief description follows. The dependent variable is green patents. The instruments used are variables that measure the total capital stock (measured in billions of $); gross domestic product (GDP); openness, which is the sum of the exports’ and imports’ share of GDP; a financial index; an environmental index; and an emissions index. The parameters of interest are the regression coefficients on the endogenous variables (i.e., GRDI, RDP and GETE). We have demonstrated through Monte Carlo simulations and theoretical evidence that the newly proposed SWLIML estimator is superior to the other estimators. Therefore, the OLS, LIML and SWLIML estimators are used to analyze this dataset. The Durbin–Wu–Hausman test (3.3) is used to test the hypothesis that the variables GRDI, RDP and GETE are exogenous. The test statistic value of ${H}_{n}$ is 36.423, which indicates that there is an endogeneity issue in this dataset on the Nordic countries. However, Maasoumi et al. (2021) evidently reject the hypothesis that endogenous explanatory variables are present in the OECD countries dataset.

Bootstrapping is used to obtain the standard errors of the estimates and the root mean squared error (RMSE). The data are bootstrapped B = 50,000 times with resampling across individuals (Huang et al. 2020). The results, which are estimates of the endogenous variables, i.e., GRDI, RDP and GETE, and the RMSE of the estimators, are shown in Table 5. The SWLIML estimates are based on the weights for the OLS and LIML estimators. The coefficient on GRDI using the OLS estimator is negative, which is not what Maasoumi et al. (2021) found. The impact of GRDI on green patents should be positive, and it is positive when the LIML and SWLIML estimators are used to estimate the corresponding coefficients. The other variables (RDP and GETE) are found to have positive effects on green patents, which is similar to the results of Maasoumi et al. (2021). Furthermore, the bootstrapped standard errors for the SWLIML estimator are less than those for the LIML estimator irrespective of the magnitude of the estimated coefficients. The RMSE values for the OLS, LIML and SWLIML estimators are 3.154, 2.341 and 2.112, respectively. The SWLIML estimator has a smaller RMSE than the OLS and LIML estimators, and the quality of its predictions is much better. This indicates the stability of the SWLIML estimator and confirms the results of the simulation that showed that SWLIML has lower SREs in finite samples. Based on the simulated results and those from this application, we conclude that the proposed SWLIML estimator performs better than the other estimators. Consequently, we recommend that applied researchers use our proposed new SWLIML estimator.

Table 5 Estimates for the green patent dataset

Full size table

7 Conclusions

This paper considers a SWLIML estimator for estimating the regression coefficients from the reduced form of a set of simultaneous equations. It is shown that the SWLIML estimator is superior to the LIML since it possesses reduced risk, in particular, a reduction in the asymptotic mean squared error. Monte Carlo simulation results show that the SWLIML estimator is preferable or is close to the optimal estimator, which is a major improvement considering the uncertainty that exists when choosing between OLS and LIML, specifically regarding the stability of the asymptotic mean square error in the presence of endogeneity and many instruments. The SWLIML estimator also dominates the SWTSLS estimator when there are many instruments, and it is more robust to weak/many instruments than the TSLS estimator. The use of SWLIML enables applied researchers to conduct efficient estimation in the presence of endogeneity and many instruments.

Notes

“The LIML estimator is consistent when there are many instruments as it imposes additional restrictions on the model that help to reduce the bias in the estimates of the parameters. It does this by utilizing optimal instruments obtained by maximizing the likelihood function of the reduced-form equation.”
The Durbin-Wu-Hausman statistic is used to examine the endogeneity by comparing the LIML estimates to OLS estimates.
${{\varvec{z}}}_{1i}$ and ${{\varvec{z}}}_{2i}$ are the included and excluded exogenous variables, respectively: “${{\varvec{z}}}_{2i}$ are variables which might be included in (2.1) (i.e., ${{\varvec{z}}}_{2i}$ are uncorrelated with ${\mu }_{i}$); however, they could be excluded as they have true zero coefficients in the equation” (Hansen, 2021).
“The local to zero assumption is not supposed to be taken literally but is supposed to be a somewhat useful distributional approximation.”
In addition to conducting simulation experiments, we also considered the model with additional covariates to address different empirically relevant problems. Specifically, we set the values of $E$ and $K$ to 12 and 120, respectively, to account for the larger number of variables involved.
The mean squared error is calculated for each estimator, providing an additional measure of their performance. The results of the mean squared error are consistent with those of the median squared error, and results are available from the author upon request.
A green patent is a patent for products or designs that provide environmental benefits.

References

Altonji JG, Elder TE, Taber CR (2005) Selection on observed and unobserved variables: assessing the effectiveness of Catholic schools. J Polit Econ 113(1):151–184
Article Google Scholar
Altonji JG, Elder TE, Taber CR (2008) Using selection on observed variables to assess bias from unobservables when evaluating swan-ganz catheterization. Am Econ Rev 98(2):345–350
Article Google Scholar
Amin M, Qasim M, Amanullah M, Afzal S (2020) Performance of some ridge estimators for the gamma regression model. Stat Pap 61(3):997–1026
Article MathSciNet MATH Google Scholar
Anatolyev S (2013) Instrumental variables estimation and inference in the presence of many exogenous regressors. Economet J 16(1):27–72
Article MathSciNet MATH Google Scholar
Anderson TW, Rubin H (1949) Estimation of the parameters of a single equation in a complete system of stochastic equations. Ann Math Stat 20(1):46–63
Article MathSciNet MATH Google Scholar
Anderson TW, Kunitomo N, Matsushita Y (2010) On the asymptotic optimality of the LIML estimator with possibly many instruments. J Econom 157(2):191–204
Article MathSciNet MATH Google Scholar
Angrist JD, Krueger AB (2001) Instrumental variables and the search for identification: from supply and demand to natural experiments. J Econ Perspect 15(4):69–85
Article Google Scholar
Baranchik AJ (1970) A family of minimax estimators of the mean of a multivariate normal distribution. Ann Mathematical Stat 642–645
Bekker PA (1994) Alternative approximations to the distributions of instrumental variable estimators. Econom J Econom Soc 62:657–681
MathSciNet MATH Google Scholar
Birke M, Van Bellegem S, Van Keilegom I (2017) Semi-parametric Estimation in a Single-Index model with endogenous variables. Scand J Stat 44(1):168–191
Article MathSciNet MATH Google Scholar
Chalak K, White H (2011) An extended class of instrumental variables for the estimation of causal effects. Canadian J Econ/revue Canadienne D’économique 44(1):1–51
Article Google Scholar
Chao JC, Swanson NR (2005) Consistent estimation with a large number of weak instruments. Econometrica 73(5):1673–1692
Article MathSciNet MATH Google Scholar
Davidson R, MacKinnon JG (1993) Estimation and inference in econometrics. OUP Catalogue, Oxford
MATH Google Scholar
Davidson R, MacKinnon JG (2004) Econometric theory and methods. Oxford University Press, New York
Google Scholar
DiTraglia FJ (2016) Using invalid instruments on purpose: focused moment selection and averaging for GMM. J Econom 195(2):187–208
Article MathSciNet MATH Google Scholar
Donald SG, Newey WK (2001) Choosing the number of instruments. Econometrica 69(5):1161–1191
Article MathSciNet MATH Google Scholar
Donald SG, Imbens GW, Newey WK (2009) Choosing instrumental variables in conditional moment restriction models. J Econom 152(1):28–36
Article MathSciNet MATH Google Scholar
Faber J, Hesen AB (2004) Innovation capabilities of European nations: cross-national analyses of patents and sales of product innovations. Res Policy 33(2):193–207
Article Google Scholar
Fuller WA (1977) Some properties of a modification of the limited information estimator. Econom J Econom Soc 45:939–953
MathSciNet MATH Google Scholar
Galvao AF, Montes-Rojas GV (2010) Penalized quantile regression for dynamic panel data. J Stat Plan Inference 140(11):3476–3497
Article MathSciNet MATH Google Scholar
Ghosh P, Grier K, Kim J (2021) Heterogeneous endogeneity. Stat Pap 62(2):847–886
Article MathSciNet MATH Google Scholar
Hahn J, Inoue A (2002) A Monte Carlo comparison of various asymptotic approximations to the distribution of instrumental variables estimators. Economet Rev 21(3):309–336
Article MathSciNet MATH Google Scholar
Hansen BE (2016) Efficient shrinkage in parametric models. J Econom 190(1):115–132
Article MathSciNet MATH Google Scholar
Hansen BE (2017) Stein-like 2SLS estimator. Economet Rev 36(6–9):840–852
Article MathSciNet MATH Google Scholar
Hansen BE (2021) Econometrics. Princeton University Press, New Jersey
MATH Google Scholar
Hausman JA (1983) Specification and estimation of simultaneous equation models. Handb Econ 1:391–448
MATH Google Scholar
Heshmati A, Abolhosseini S, Altmann J (2015) The development of renewable energy sources and its significance for the environment. Springer, Singapore, pp 7–29
Book Google Scholar
Huang B, Lee TH, Ullah A (2020) Combined estimation of semiparametric panel data models. Econom Stat 15:30–45
MathSciNet Google Scholar
Kibria BG, Saleh AME (2006) Optimum critical value for pre-test estimator. Commun Stat Simul Comput 35(2):309–319
Article MathSciNet MATH Google Scholar
Kim TH, White H (2001) James-Stein-type estimators in large samples with application to the least absolute deviations estimator. J Am Stat Assoc 96(454):697–705
Article MathSciNet MATH Google Scholar
Klein R, Vella F (2010) Estimating a class of triangular simultaneous equations models without exclusion restrictions. J Econom 154(2):154–164
Article MathSciNet MATH Google Scholar
Kuersteiner G, Okui R (2010) Constructing optimal instruments by first-stage prediction averaging. Econometrica 78(2):697–718
Article MathSciNet MATH Google Scholar
Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, New York
MATH Google Scholar
Levinsohn J, Petrin A (2003) Estimating production functions using inputs to control for unobservables. Rev Econ Stud 70(2):317–341
Article MATH Google Scholar
Lewbel A (2012) Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. J Bus Econ Stat 30(1):67–80
Article MathSciNet Google Scholar
Maasoumi E (1978) A modified Stein-like estimator for the reduced form coefficients of simultaneous equations. Economet J Econom Soc 46:695–703
Article MathSciNet MATH Google Scholar
Maasoumi E, Heshmati A, Lee I (2021) Green innovations and patenting renewable energy technologies. Empir Econ 60(1):513–538
Article Google Scholar
Mayston D (2009) The determinants of cumulative endogeneity bias in multivariate analysis. J Multivar Anal 100(6):1120–1136
Article MathSciNet MATH Google Scholar
Nooi Asl M, Bevrani H, Arabi Belaghi R, Mansson K (2021) Ridge-type shrinkage estimators in generalized linear models with an application to prostate cancer data. Stat Pap 62(2):1043–1085
Article MathSciNet MATH Google Scholar
Popp D (2002) Induced innovation and energy prices. Am Econ Rev 92(1):160–180
Article Google Scholar
Sawa T (1973) Almost unbiased estimator in simultaneous equations systems. Int Econ Rev 14:97–106
Article MathSciNet MATH Google Scholar
Schiederig T, Tietze F, Herstatt C (2012) Green innovation in technology and innovation management–an exploratory literature review. R&d Manag 42(2):180–192
Article Google Scholar
Song X, Wang CY (2019) GMM nonparametric correction methods for logistic regression with error-contaminated covariates and partially observed instrumental variables. Scand J Stat 46(3):898–919
Article MathSciNet MATH Google Scholar
Srivastava MS, Bilodeau M (1989) Stein estimation under elliptical distributions. J Multivar Anal 28(2):247–259
Article MathSciNet MATH Google Scholar
Stein CM (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc Third Berkeley Sympoisum on Mathematical Stat Probability 1:197–206
Stein CM (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9:1135–1151
Article MathSciNet MATH Google Scholar
Stock JH, Wright JH, Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. J Bus Econ Stat 20(4):518–529
Taylor MZ (2016) The politics of innovation: why some countries are better than others at science and technology. Oxford University Press, Oxford
Book Google Scholar
Toker S (2020) Investigating the two parameter analysis of Lipovetsky for simultaneous systems. Stat Pap 61(5):2059–2089
Article MathSciNet MATH Google Scholar
Vansteelandt S, Didelez V (2018) Improving the robustness and efficiency of covariate-adjusted linear instrumental variable estimators. Scand J Stat 45(4):941–961
Article MathSciNet MATH Google Scholar
Wang Y, Zhang Y, Zhou Q (2016) A Stein-like estimator for linear panel data models. Econ Lett 141:156–161
Article MathSciNet MATH Google Scholar
Wooldridge JM (2009) On estimating firm-level production functions using proxy variables to control for unobservables. Econ Lett 104(3):112–114
Article MathSciNet MATH Google Scholar

Download references

Acknowledgment

I am very grateful to Kristofer Månsson, Pär Sjölander, and BM Golam Kibria for their insightful and thorough comments, which have significantly enhanced the quality of this paper. Additionally, I extend my appreciation to Daniel J. Henderson and Zangin Zeebari for their valuable input during the paper development boot camp in 2022. I am also thankful to Almas Heshmati for providing the interesting dataset used in this study. Finally, I would like to thank four anonymous referees and the editor for their valuable suggestions and comments. Any remaining errors or shortcomings in this paper are entirely my responsibility.

Funding

Open access funding provided by Jönköping University.

Author information

Authors and Affiliations

Department of Economics, Finance and Statistics, Jönköping International Business School, Jönköping University, Gjuterigatan 5, 553 18, Jönköping, Sweden
Muhammad Qasim

Authors

Muhammad Qasim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Qasim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 546 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qasim, M. A weighted average limited information maximum likelihood estimator. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01485-2

Download citation

Received: 22 August 2022
Revised: 17 March 2023
Accepted: 14 August 2023
Published: 07 October 2023
DOI: https://doi.org/10.1007/s00362-023-01485-2

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A weighted average limited information maximum likelihood estimator

Abstract

Similar content being viewed by others

Assessing the Performance of Estimators Dealing with Measurement Errors

Feasible generalized least squares for panel data with cross-sectional and serial correlations

Estimation in a semi-varying coefficient model for panel data with fixed effects

1 Introduction

2 Model and notation

3 Stein-weighted LIML estimator