1 Introduction

The optimal taxation of commodities remains a contentious issue. International organizations and academic reviews recommend that VAT rates are uniform (for example, OECD, 2020, p. 42; Mirrlees et al., 2011, ch. 9). Yet most governments do apply reduced VAT rates. By doing so, they aim to increase the purchasing power of low-income groups, to stimulate the consumption of merit goods such as culture, and to redistribute towards specific groups, for example, through reduced rates for female hygiene products.Footnote 1

The recommendation that commodities are taxed at uniform rates is rooted in a seminal paper by Atkinson and Stiglitz (1976). They show that when preferences are weakly separable between leisure and consumption, and taxpayers differ only in their labour productivities, commodities should indeed be taxed at uniform rates. The reason is that with such a one-dimensional population, if all households have equal preferences, individual consumption patterns contain no information about the households’ types beyond what is already revealed by the households’ labour incomes. Given then that any redistribution attained by differentiated commodity taxes can also be attained by a tax on labour income with equal distortions of labour supply but without distortions of consumption decisions, it is best to not use differentiated commodity taxes for redistribution. Only when there are clear complementarities between labour supply and the consumption of specific goods, most notably childcare and transportation to work, can differentiated commodity taxes help reduce the distortions caused by a progressive tax on labour income (Gordon & Kopczuk, 2014; Pirttilä & Suoniemi, 2014).

A notable shortcoming of the model of Atkinson and Stiglitz (1976) is that taxpayers differ only in their labour productivities. Saez (2002) studies the desirability of differentiated linear commodity taxes when taxpayers also differ in their tastes. Studying linear commodity taxes besides a non-linear income tax is useful due to the possibility of arbitrage between taxpayers, and because linear commodity taxes can be levied at the point of sale rather than on the individual level. Saez (2002) shows that starting from a situation without commodity taxes, introducing a small linear tax on one commodity increases social welfare if, on average, higher-income taxpayers have a relatively higher taste for consuming this commodity.Footnote 2

Ferey et al. (2022) derive optimal tax conditions in the setting studied by Saez (2002), still for the case where households earn a single labour income. The relation between their conditions for the linear optimum, and the desirability conditions found by Saez (2002), remains unclear. Saez (2002, p. 229) leaves this as an “extremely useful” task for future research.

In this paper, I derive formulas for the optimal linear taxation of commodities in a broad generalization of the model of Atkinson and Stiglitz (1976). By doing so, I identify the sufficient statistics that allow quantifying the optimal commodity tax rates. Extending Saez (2002), I allow for taxpayers to differ in multiple characteristics such as tastes, productivities, and gender. Furthermore, I allow for taxpayers to earn multiple labour incomes. Allowing multiple labour incomes in the household allows studying, for example, the effect of different complementarities of primary and secondary labour supply with the consumption of certain goods (for example, childcare). I briefly characterize the optimal non-linear schedule for the multidimensional labour income tax and then focus on the optimal linear tax rates on commodities.

Main Results My results are as follows: If the tax schedule on the labour incomes is optimal, then one should only deviate from uniform commodity taxes if doing so either contributes to the distributional objectives of the government in ways that cannot be accomplished through the tax on labour income, or if it reduces the distortions caused by the taxation of labour income.

A first reason to deviate from uniform commodity taxes is when, on average and conditional on labour income, the covariance between the social marginal utilities and the consumption of a particular good differs from zero. This correlation could be due to differences in preferences affecting the marginal utility of consumption, or it could be influenced by societal judgments incorporated in the welfare function and the cardinalization of utility functions. For instance, the government might find a strong taste for a certain good to be socially beneficial, or it might aim to compensate those who consume specific goods to offset other injustices.Footnote 3 The larger the covariance between the social marginal utilities and the consumption of a specific good among individuals with the same income, the greater the distortion that the government should tolerate from taxing that good. This motive for deviating from uniform commodity taxes is absent if taxpayers differ only in their labour productivities.

A second reason to deviate from uniform commodity taxes arises when, on average and conditional on labour income, there is a difference between the cross-sectional variation and the individual variation in the consumption of some good as one changes one of the labour incomes. I show that this can be the case for two reasons: First, around the labour incomes under consideration, taxpayers with different incomes may have different consumption preferences. Second, again around the labour incomes under consideration, taxpayers who earn the same disposable incomes but supply different quantities of labour may consume different quantities of the good. This could be due to complementarities between leisure and consumption of that good. If there is a difference between the cross-sectional and individual variations for either of these reasons, then consumption patterns reveal information about taxpayer types, enabling more efficient redistribution than through labour tax alone. The larger this difference, the larger the acceptable distortion from taxing that good. Moreover, the optimal distortion is larger if the marginal excess burden of taxing labour income is larger, because shifting taxation from labour to consumption yields greater benefits.

A third reason to deviate from uniform commodity taxes is more technical. If on average and conditional on labour income, households for whom the individual variation of the consumption of some good with one of the labour incomes is larger also face a larger marginal excess burden of taxing that labour income, then shifting the burden of taxation from that labour income to the good under consideration increases the efficiency of the tax system. As noted by Saez (2002), such covariance would be difficult to measure empirically. This third term is absent in models where households differ only in their labour productivities.

Saez (2002) outlines three conditions under which, in presence a single labour income, optimal commodity taxes would be uniform. However, he does not show how these conditions relate to a condition that characterizes the optimum. In this paper, I do characterize the optimum in the presence of multiple optimally taxed labour incomes. I show how the various marginal costs and benefits of perturbing a tax rate, as identified by Saez (2002), should be balanced against each other. When all three conditions identified by Saez (2002) are met, some terms remain in the optimal tax expression, but the resulting expression indeed implies uniform commodity taxation, even in presence of multiple labour incomes.

Relationship to Existing Literature

This paper makes several contributions to the literature. Saez (2002) studies the linear taxation of commodities when taxpayers differ in their labour productivities and their tastes, but they earn only one labour income. He only finds desirability conditions for non-uniform commodity taxes and does not characterize the optimal tax rates. Jacobs and Boadway (2014) do derive optimal linear commodity taxes, for the case where households differ only in their labour productivities, and they earn only one labour income. Because they consider a population that differs in one dimension only, they find less reasons than Saez (2002) to levy differentiated commodity taxes. Kaplow (2008) studies various ways in which preference heterogeneity leads to correlations between the social marginal utilities and the consumption of particular goods. He does not show how these correlations fit into an optimal tax expression. Ferey et al. (2022) derive optimal linear commodity taxes for the case studied by Saez (2002), without showing how their optimality conditions relate to the desirability conditions found by Saez (2002). I extend the literature by deriving the optimal linear commodity tax rates for the general case, allowing households to differ in characteristics beyond their labour productivities and preferences and to earn multiple labour incomes. I demonstrate how the three conditions for uniform commodity taxes, as identified by Saez (2002), correspond to the terms in the optimal tax expression.

Road Map We proceed as follows: I introduce the model in section 2. In section 3, I derive properties of taxpayer behaviour, and I introduce sufficient statistics conditional on labour income. I present the optimal tax schedules and their relation to the findings of Saez (2002) in section 4, before concluding in section 5.

2 The model

2.1 Households

Households differ in labour productivities \({\varvec{w}}\equiv (w^{1},\ldots ,w^{L})\in \mathscr {W}\), where \(L\ge 1\) and \(\mathscr {W}\subset {\mathbb {R}}_{+}^{L}\) is closed and convex. For each productivity dimension \(l=1,\ldots ,L\), households supply labour \(\ell ^{l}\), yielding labour income \(z^{l}=w^{l}\ell ^{l}\). Households pay labour income taxes \(T({\varvec{z}})\), where the tax schedule can depend on all labour incomes \({\varvec{z}}\equiv (z^{1},\ldots ,z^{L})\) simultaneously in complicated ways.

Households spend their disposable income on numéraire good c and commodities \({\varvec{x}}\equiv (x^{1},\ldots ,x^{J})\). Producer prices are normalized to one, and commodities are taxed at linear rates \({\varvec{t}}\equiv (t^{1},\ldots ,t^{J})\). The budget constraint for a household is:

$$\begin{aligned} c+\sum _{j=1}^{J}x^{j}(1+t^{j}) =\sum _{l=1}^{L}z^{l}-T({\varvec{z}}). \end{aligned}$$
(1)

Besides their labour productivities, households can differ in other dimensions. We parametrize the additional characteristics using the vector \(\varvec{\theta }\equiv (\theta ^{1},\ldots ,\theta ^{K})\in \Theta \subset {\mathbb {R}}_{+}^{K}\). We denote the type vectors as \(\varvec{\omega }\equiv ({\varvec{w}},\varvec{\theta })\in \Omega \equiv \mathscr {W}\times \Theta\). We denote the joint cumulative distribution of the types as \(G^{\varvec{\omega }}(\varvec{\omega })\) with corresponding twice continuously differentiable density function \(g^{\varvec{\omega }}(\varvec{\omega })\). We do not make any assumptions about the joint distribution of the types. The only exception is that we assume there are no deterministic relationships between the productivity parameters \({\varvec{w}}\) and the additional characteristics \(\varvec{\theta }\).

Each household makes its decisions as one unit. They choose consumptions c and \({\varvec{x}}\), and labour incomes \({\varvec{z}}\), taking into account their budget constraint (1), to maximize a thrice continuously differentiable, weakly concave utility function:

$$\begin{aligned} \max _{c,{\varvec{x}},{\varvec{z}}}u\left( c,{\varvec{x}},\frac{z^{1}}{w^{1}},\ldots ,\frac{z^{L}}{w^{L}},\varvec{\omega }\right) , \end{aligned}$$
(2)

with partial derivatives \(u_{c}>0\), \(u_{x^{j}}>0\) and \(u_{\ell ^{l}}<0\), and with indifference sets that are strictly convex in \((c,{\varvec{x}},{\varvec{z}})\) for all utility levels and types. We assume that for all households, the second-order conditions hold for optimization problem (1)–(2), and that the household’s optimization problem admits a single global maximum. These assumptions, together with the assumption that the tax schedule \({\varvec{z}}\mapsto T({\varvec{z}})\) is thrice continuously differentiable, prevent that small tax perturbations lead to “jumps” in the choices of the households.Footnote 4 For a household of type \(\varvec{\omega }\) and for tax policies T and \({\varvec{t}}\), we denote indirect utility as \(v(\varvec{\omega },T,{\varvec{t}})\).

2.2 Government

The government sets its tax policies to maximize an additive social welfare function subject to a budget constraint. The social welfare function isFootnote 5:

$$\begin{aligned} \max _{T,{\varvec{t}}}\iint _{\Omega }W\left( v(\varvec{\omega },T, {\varvec{t}}),\varvec{\omega }\right) \textrm{d}G^{\varvec{\omega }}(\varvec{\omega }), \end{aligned}$$
(3)

where \((v,\omega )\mapsto W(v,\omega )\) is a twice continuously differentiable, increasing, and weakly concave transformation of household utility that captures the government’s normative preferences. Note that the weight given to a household can depend on its type \(\varvec{\omega }\).Footnote 6 The government’s budget constraint is:

$$\begin{aligned} \iint _{\Omega }\left( T({\varvec{z}})+\sum _{j=1}^{J}t^{j}x^{j}\right) \textrm{d}G^{\varvec{\omega }}(\varvec{\omega })\ge E, \end{aligned}$$
(4)

with E an exogenously given revenue requirement.

3 Taxpayer behaviour

The first-order conditions for household optimization problem (1)–(2) for labour incomes \(l=1,\ldots ,L\) and commodities \(j=1,\ldots ,J\) are, respectively:

$$\begin{aligned} \frac{u_{\ell ^{l}}}{u_{c}}=-w^{l}(1-T_{z^{l}})\text { and }\frac{u_{x^{j}}}{u_{c}}=1+t^{j}. \end{aligned}$$
(5)

For each household type \(\varvec{\omega }\), we denote the optimal labour incomes as \({\varvec{Z}}(\varvec{\omega })\equiv (Z^{1}(\varvec{\omega }),\ldots ,Z^{L}(\varvec{\omega }))\), the optimal consumption of the numéraire good as \(C(\varvec{\omega })\), and the optimal consumption of the other commodities as \({\varvec{X}}(\varvec{\omega })\equiv (X^{1}(\varvec{\omega }),\ldots ,X^{J}(\varvec{\omega }))\). We denote the income space as the set of incomes that are chosen by any household. In other words, we represent the income space as \({\mathcal {Z}}\equiv \{{\varvec{z}}\in {{\mathbb {R}}}_{+}^{L}|\exists \varvec{\omega }\in \mathscr {W}:{\varvec{Z}}(\varvec{\omega })={\varvec{z}}\}\).

With the assumptions made in section 2.1, the households’ responses to small perturbations of the tax policies are well defined. Let \(\rho\) denote a lump sum perturbation, and let \(\tau ^{l}\) denote a compensated perturbation of the l-th marginal tax rate, such that the perturbed tax schedule is:

$$\begin{aligned} {\varvec{z}}\mapsto T({\varvec{z}})-\rho +\sum _{l=1}^{L}\tau ^{l}\cdot (z^{l}-Z^{l}(\varvec{\omega })). \end{aligned}$$
(6)

We denote the behavioural responses to an increase in the lump sum income for any commodity j as \(\partial X^{j}(\varvec{\omega })/\partial \rho\), and for any labour income l as \(\partial Z^{l}(\varvec{\omega })/\partial \rho\). We denote the behavioural responses to a compensated perturbation of the m-th marginal income tax rate as \(\partial X^{j*}(\varvec{\omega })/\partial \tau ^{m}\) and \(\partial Z^{l*}(\varvec{\omega })/\partial \tau ^{m}\). Lastly, we denote the behavioural responses to a change of the linear tax rate on the i-th commodity as \(\partial X^{j}(\varvec{\omega })/\partial t^{i}\) and \(\partial Z^{l}(\varvec{\omega })/\partial t^{i}\).Footnote 7

Standard derivations yield Slutsky decompositions (omitting function arguments, and with asterisks denoting compensated effects):

$$\begin{aligned} \forall j,m:\frac{\partial X^{m}}{\partial t^{j}}=\frac{\partial X^{m*}}{\partial t^{j}}-\frac{\partial X^{m}}{\partial \rho }X^{j}, \end{aligned}$$
(7)

and:

$$\begin{aligned} \forall j,l:\frac{\partial Z^{l}}{\partial t^{j}}=\frac{\partial Z^{l*}}{\partial t^{j}}-\frac{\partial Z^{l}}{\partial \rho }X^{j}, \end{aligned}$$
(8)

and Slutsky symmetries:

$$\begin{aligned} \forall l,m,j,k:&\frac{\partial Z^{l*}}{\partial \tau ^{m}}=\frac{\partial Z^{m*}}{\partial \tau ^{l}},\quad \frac{\partial X^{j*}}{\partial \tau ^{l}}=\frac{\partial Z^{l*}}{\partial t^{j}}\quad \text {and}\quad \frac{\partial X^{k*}}{\partial t^{j}}=\frac{\partial X^{j*}}{\partial t^{k}}. \end{aligned}$$
(9)

Following Christiansen (1984), Saez (2002), and Jacobs and Boadway (2014), we split the decision process of the households into two phases. In the second phase, households take their labour incomes and thus their disposable incomes as given, and they decide how to spend their money on consumption goods. In the first phase, households choose their labour incomes, taking the ensuing outcome of the second phase as given. Conditional on the combination of labour incomes \({\varvec{z}}\), we denote the corresponding optimal consumption of the j-th good as \(X^{cj}(\varvec{\omega },{\varvec{z}})\).

4 Optimal policies

Saez (2001) was the first to characterize the optimal labour income tax schedule in terms of sufficient statistics. Werquin et al. (2015) did the same for the case with multiple labour incomes. The optimal taxation of multiple labour incomes is complicated, and its interpretation is beyond the scope of the present paper. I simply state the optimality conditions for the tax on the labour incomes in section 4.1 and take these optimality conditions as given in section 4.2 to characterize the optimal linear commodity taxes. In section 4.3, I state the conditions for uniform commodity taxation identified by Saez (2002) in the notations of the present paper, and I relate those conditions to the equation that characterizes the optimum.

4.1 Optimal tax on labour incomes

Before stating the optimality condition for the tax schedule on the labour incomes, I introduce some notations. We denote the marginal excess burden of a tax on the l-th labour income as:

$$\begin{aligned} \forall \varvec{\omega }:{\mathcal {W}}^{l}(\varvec{\omega })\equiv -\sum _{m=1}^{L}T_{z_{m}}\frac{\partial Z^{m*}(\varvec{\omega })}{\partial \tau ^{l}}-\sum _{j=1}^{J}t^{j}\frac{\partial X^{j*}(\varvec{\omega })}{\partial \tau ^{l}}. \end{aligned}$$
(10)

The marginal excess burden of a tax on the l-th labour income indicates the compensated loss in tax revenues caused by an increase in the l-th marginal tax rate.

We denote the social marginal utility of income as (with \(\lambda\) denoting the government’s budget Lagrange multiplier)Footnote 8:

$$\begin{aligned} b(\varvec{\omega })\equiv \frac{W_{v}u_{c}}{\lambda }, \end{aligned}$$

and the corresponding net social marginal valuation of income:

$$\begin{aligned} \forall \varvec{\omega }:\beta (\varvec{\omega })\equiv b(\varvec{\omega })+B_{\rho }(\varvec{\omega }). \end{aligned}$$
(11)

Here, \(B_{\rho }(\varvec{\omega })\) captures the effect on tax revenues caused by behavioural responses to a marginal change in \(\rho\):

$$\begin{aligned} B_{\rho }(\varvec{\omega })\equiv \sum _{l=1}^{L}T_{z^{l}}\frac{\partial Z^{l}(\varvec{\omega })}{\partial \rho }+\sum _{j=1}^{J}t^{j}\frac{\partial X^{j}(\varvec{\omega })}{\partial \rho }. \end{aligned}$$
(12)

It is the marginal propensity to pay taxes out of additional income.

The derivations and the interpretation of the optimal tax schedule on the labour incomes are beyond the scope of this paper. Werquin et al. (2015) and Spiritus et al. (2022) show that, given our assumptions, the optimal tax schedule is characterized byFootnote 9:

$$\begin{aligned} \forall z:\sum _{l=1}^{L}\frac{\partial }{\partial z^{l}}\left( \overline{{\mathcal {W}}^{l}}({\varvec{z}})g^{\textrm{z}}({\varvec{z}})\right) =-(1-\overline{\beta }(z))g^{\textrm{z}}({\varvec{z}}), \end{aligned}$$
(13)

where for any function \(\varvec{\omega }\mapsto h(\varvec{\omega })\), the notation \({\varvec{z}}\mapsto \overline{h}({\varvec{z}})\) denotes the average of \(h(\varvec{\omega })\) over all types \(\varvec{\omega }\) who choose labour income \({\varvec{Z}}(\varvec{\omega })={\varvec{z}}\); \({\varvec{z}}\mapsto g^{\textrm{z}}({\varvec{z}})\) denotes the probability density function for the labour incomes; and \({\varvec{z}}\mapsto G^{\textrm{z}}({\varvec{z}})\) denotes the cumulative distribution function.

At each point \({\varvec{z}}\) on the boundary of the income space \({\mathcal {Z}}\), the following boundary condition must holdFootnote 10:

$$\begin{aligned} \sum _{l=1}^{L}\overline{{\mathcal {W}}^{l}}({\varvec{z}})e^{l}({\varvec{z}})=0, \end{aligned}$$
(14)

where \(e^{l}({\varvec{z}})\) is the l-th component of the unit vector normal to the boundary of the income space at the point \({\varvec{z}}\). Applying the divergence theorem to condition (13) and substituting the boundary conditions (14) show the traditional condition that the net social marginal valuations of income should average to one:

$$\begin{aligned} \iint _{\Omega }\beta (\varvec{\omega })\textrm{d}G^{\varvec{\omega }}(\varvec{\omega })=1. \end{aligned}$$
(15)

4.2 Optimal commodity taxes

The linear tax rate on the j-th commodity is optimal if perturbing it leaves social welfare unaffected. Demanding that the sum of all effects of a perturbation of \(t^{j}\) on social welfare equals zero, the government’s first-order condition for the optimal linear tax on the j-th commodity is:

$$\begin{aligned} \iint _{\Omega }\left\{ \left( 1-\frac{W^{\prime }u_{c}}{\lambda }\right) X^{j}+\sum _{l=1}^{L}T_{z^{l}}\frac{\partial Z^{l}}{\partial t^{j}}+\sum _{k=1}^{J}t^{k}\frac{\partial X^{k}}{\partial t^{j}}\right\} \textrm{d}G^{\varvec{\omega }}(\varvec{\omega })=0, \end{aligned}$$
(16)

where we omit the function arguments in the integrand to simplify notations.

Substitute Slutsky decompositions (7)–(8) and definition (11) of the net social marginal utilities:

$$\begin{aligned} \iint _{\Omega }\left\{ \left( 1-\beta \right) X^{j}+\sum _{l=1}^{L}T_{z^{l}}\frac{\partial Z^{l*}}{\partial t^{j}}+\sum _{k=1}^{J}t^{k}\frac{\partial X^{k*}}{\partial t^{j}}\right\} \textrm{d}G^{\varvec{\omega }}(\varvec{\omega })=0. \end{aligned}$$
(17)

Condition (17) characterizes the optimal linear taxes on the commodities, even when the tax schedule on the labour incomes is not optimal. We are interested in the optimal tax rates on the commodities when the tax schedule on the labour incomes is optimal, thus optimal tax conditions (13) and boundary conditions (14) are fulfilled.Footnote 11

I will start by introducing the theorem that characterizes the optimal commodity taxes. Following that, in the following subsections, I will provide intuitions for the most important terms. Lastly, I will restate the conditions for uniform commodity taxation in the optimum, as identified by Saez (2002), using this paper’s notations, and I will show how they correspond to the terms in the optimal tax expression.

4.2.1 Condition for optimal commodity taxes

Theorem 1

Suppose households differ in one or more unobservable characteristics, they earn one or more labour incomes, and they make their decisions as one unit. Suppose the tax schedule applied to the labour incomes is optimal. Differentiated commodity taxes then may offer distributional benefits over and above those offered by the taxes on labour income. Moreover, differentiated commodity taxes may help to reduce the distortions that are caused by the tax on the labour incomes. If these benefits are present, then the optimal commodity taxes are differentiated. These benefits must then be balanced against the costs of distorting the household’s consumption choices. The following condition characterizes the optimal linear commodity taxes, for all commodities \(x^{j}\):

$$\begin{aligned} \iint _{{\mathcal {Z}}}\sum _{k=1}^{J}t^{k}\overline{\frac{\partial X^{ck*}}{\partial t^{j}}}\textrm{d}G^{\textrm{z}}({\varvec{z}})=&\iint _{{\mathcal {Z}}}\textrm{cov}(b,X^{j}|{\varvec{z}})\textrm{d}G^{\textrm{z}}({\varvec{z}})\nonumber \\ \quad -&\sum _{l=1}^{L}\iint _{{\mathcal {Z}}}\overline{{\mathcal {W}}^{l}}\left( \frac{\partial \overline{X^{j}}}{\partial z^{l}}-\frac{\overline{\partial X^{cj}}}{\partial z^{l}}\right) \textrm{d}G^{\textrm{z}}({\varvec{z}})\nonumber \\ \quad+&\sum _{l=1}^{L}\iint _{{\mathcal {Z}}}\textrm{cov}\left( {\mathcal {W}}^{l},\frac{\partial X^{cj}}{\partial z^{l}}\Bigg |z\right) \textrm{d}G^{\textrm{z}}({\varvec{z}})\nonumber \\ \quad+&\iint _{{\mathcal {Z}}}\textrm{cov}(B_{\rho },X^{j}|{\varvec{z}})\textrm{d}G^{\textrm{z}}({\varvec{z}}). \end{aligned}$$
(19)

Proof

See Appendix 1. \(\square\)

I will discuss the different terms of theorem 1 in the following subsections. The key distinction between conditions (17) and (19) lies in the assumption of the optimality of the tax schedule on labour incomes in the latter. Commodity taxes, to some extent, act as a surrogate for a tax on labour income. They modify the purchasing power associated with any given labour income, leading to distributional effects and influencing labour supply decisions. To the extent that commodity taxes mirror a tax on labour income, their associated costs and benefits are already balanced at the margin due to the optimality of the labour income tax schedule. Therefore, condition (19) balances the costs and benefits of commodity taxes only to the extent that they cannot be duplicated by a tax on labour incomes.

If the right-hand side of (19) differs from zero, then in the optimum, at least one of the commodity tax rates must differ from zero. If we recall that the numéraire good c remains untaxed, it follows that if the right-hand side of (19) differs from zero, then it is desirable to levy differentiated taxes on the commodities.

4.2.2 Marginal excess burden conditional on labour income

First, let us study the left-hand side of condition (19). It captures the compensated revenue costs associated with a minor increase in the tax rate for the j-th commodity, assuming that households’ work incomes (and therefore their disposable incomes) remain constant. This part of Eq. (19) indicates that the optimal marginal excess burden induced by taxing the good under consideration, conditional on labour income, is larger as the right-hand side of (19) increases.

Condition (19) presents a challenge because it does not provide a straightforward formula for determining the optimal tax rates for each good. The main issue is that the right-hand side of condition (19) is influenced by the tax rates in complex ways. If we overlook this complexity, we could consider the right-hand side and the behavioural elasticities on the left-hand side of condition (19) as fixed, and view the optimal tax conditions for each good as a system of linear equations. To see this, note that the marginal tax rates can be moved outside of the integral:

$$\begin{aligned} \iint _{{\mathcal {Z}}}\sum _{k=1}^{J}t^{k}\overline{\frac{\partial X^{ck*}}{\partial t^{j}}}\textrm{d}G^{\textrm{z}}({\varvec{z}})=\sum _{k=1}^{J}t^{k}\iint _{{\mathcal {Z}}}\overline{\frac{\partial X^{ck*}}{\partial t^{j}}}\textrm{d}G^{\textrm{z}}({\varvec{z}}). \end{aligned}$$

If the compensated cross-prices elasticities are zero (so \(\partial X^{ck*}/\partial t^{j}=0\) for \(k\ne j\)), then a commodity with a lower own-price elasticity will have a higher tax rate. Since \(\partial X^{ck*}/\partial t^{k}\) is negative, it then follows that the tax rate \(t^{k}\) must have the opposite sign of the right-hand side of (19). However, when compensated cross-price elasticities are not zero, it seems that there is no simple equation for expressing individual optimal tax rates in terms of sufficient statistics.Footnote 12,Footnote 13 Therefore, moving forward, we will primarily focus on discussing optimal marginal excess burdens rather than individual commodity tax rates.

4.2.3 Distributional characteristics conditional on labour income

We now turn to the first term on the right-hand side of (19). This term shows the covariance between the social marginal utilities and the consumption of the commodity under consideration, among households who earn certain levels of income from labour. It tells us whether, conditional on the labour incomes, the government considers taxpayers who consume more of the good under consideration to be more deserving of additional resources. This first term is zero in models in which taxpayers only differ in their labour productivities, such as the models of Atkinson and Stiglitz (1976) and Jacobs and Boadway (2014).

Let us consider an example.Footnote 14 Suppose each household has two labour productivities \(w^{1}\) and \(w^{2}\) and they earn labour incomes \(z^{1}\) and \(z^{2}\). They derive utility from consuming a numéraire good c, and they obtain disutility \(v(\ell ^{1},\ell ^{2})\) from supplying labour. Furthermore, some households consume a good x. The utility function is:

$$\begin{aligned} U(c,x,z^{1},z^{2},w^{1},w^{2},\gamma )=\frac{c^{1-\sigma }}{1-\rho } +\gamma \frac{x^{1-\theta }}{1-\theta }-v\left( \frac{z^{1}}{w^{1}},\frac{z^{2}}{w^{2}}\right) . \end{aligned}$$

We assume that the parameter \(\gamma \ge 0\) varies among households and is independent of their labour productivity. This parameter represents a physiological need to consume good x, which could be something like a necessary medical treatment. The parameters \(\sigma >0\) and \(\theta >1\) are the same for all households. They indicate the concavities of the households’ consumption utilities from numéraire good c and good x, respectively. Because \(\theta >1\), the utility from consuming good x is always negative but increasing in x for a person who needs it (\(\gamma >0\)). Now, consider two households with the same labour incomes \(z^{1}\) and \(z^{2}\). However, one households needs to consume x \((\gamma >0\)) and the other does not \((\gamma =0\)). The household that needs to consume x will never reach the same level of utility as the household that does not need to consume x, no matter how much of x they consume. The first-order conditions for the consumption of c and x yield:

$$\begin{aligned} X=\left( \frac{\gamma C^{\sigma }}{1+t_{x}}\right) ^{\frac{1}{\theta }}. \end{aligned}$$
(20)

Among households with the same incomes, those with a greater need for good x consume more of x and less of the numéraire good c. Because of the concave preferences with respect to the numéraire good, households with a greater need for good x have higher marginal utilities of the numéraire. Since these households achieve lower utility levels, the derivative \(W^{\prime }\) of the concave social welfare function is also larger. Therefore, we find that the covariance between consumption of x and the social marginal utilities of income, conditional on income, is positive (\(\textrm{cov}(b,x|{\varvec{z}})>0\)).Footnote 15 This means that taxpayers consuming more of commodity x are more deserving of redistribution. According to theorem 1, the positive covariance between the social marginal utilities and consumption of commodity x implies that there is a reason to subsidize commodity x in the optimum.

4.2.4 Cross-sectional variation and individual variation of consumption

The second term on the right-hand side of (19) captures, for each labour income, the difference between the cross-sectional variation and the individual variation in the consumption of the good under consideration. To see this, we can follow the exposition of Saez (2002). Recall that the quantity \(\overline{X^{j}}({\varvec{z}})\) denotes the average consumption of the j-th commodity among all households who earn labour incomes \({\varvec{z}}\). If we slightly increase the l-th labour income such that the income bundle increases by a quantity \(\textrm{d}{\varvec{z}}^{l}\), then the term \(\overline{X^{j}}({\varvec{z}}+\textrm{d}{\varvec{z}}^{l})\) represents the average of \(X^{j}\) over all households who choose to earn labour incomes \({\varvec{z}}+\textrm{d}{\varvec{z}}^{l}\). The quantities \(\overline{X^{j}}({\varvec{z}})\) and \(\overline{X^{j}}({\varvec{z}}+\textrm{d}{\varvec{z}}^{l})\) thus concern different households. Consequently, the term \(\partial \overline{X^{j}}/\partial z^{l}({\varvec{z}})\) represents the variation in consumption \(X^{j}\) as we compare different households who choose income bundles that differ only by a slight change in the l-th labour income. In other words, the term concerns the cross-sectional variation of \(X^{j}\) around \({\varvec{z}}\). Note that these households also have different disposable incomes.

The quantity \(\partial X^{cj}(\varvec{\omega },{\varvec{z}})/\partial z^{l}\), on the other hand, concerns the change in consumption of a single household of type \(\varvec{\omega }\) if it is forced to earn a slightly higher l-th labour income \(z^{l}\), thus earning a slightly higher disposable income. It thus concerns the individual variation in the consumption of commodity \(x^{j}\) with labour income \(z^{l}\). The term \(\overline{\partial X^{cj}/\partial z^{l}}({\varvec{z}})\) averages this variation over all households at income bundle \({\varvec{z}}\).

To gain further insight into the second term on the right-hand side of (19), assume for a moment that for any vector \(\varvec{\theta }\) of additional characteristics, a one-to-one mapping exists between the labour productivities \({\varvec{w}}\) and the bundle of labour incomes \({\varvec{z}}\). In other words, there exists a vector-valued function \({\varvec{z}}\mapsto {\varvec{W}}({\varvec{z}},\varvec{\theta })\), which for each vector of additional characteristics \(\varvec{\theta }\) indicates for each income bundle \({\varvec{z}}\), the labour productivities of the households that choose it. Under this assumption, we derive the following identity in appendix 1, for any bundle of incomes \({\varvec{z}}\):

$$\begin{aligned} \frac{\partial \overline{X^{k}}}{\partial z^{l}}({\varvec{z}})-\frac{\overline{\partial X^{ck}}}{\partial z^{l}}({\varvec{z}})=\sum _{m}\overline{\frac{\partial X^{ck}}{\partial w^{m}}\frac{\partial W^{m}}{\partial z^{l}}}({\varvec{z}})+\textrm{cov}\left( X^{k},\frac{\partial \ln g^{\varvec{\theta }|{\varvec{z}}}}{\partial z^{l}}\Bigg |z\right) , \end{aligned}$$
(21)

where \(\varvec{\theta }\mapsto g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta }|{\varvec{z}})\) indicates the density function of the additional characteristics \(\varvec{\theta }\) conditional on labour incomes \({\varvec{z}}\).

Let us study the first term on the right-hand side of (21).Footnote 16 The quantity \(\partial X^{ck}({\varvec{w}},\varvec{\theta },{\varvec{z}})/\partial w^{m}\) tells us for given labour incomes \({\varvec{z}}\) how consumption of good \(x^{k}\) changes as the m-th labour productivity changes. There are two reasons why \(x^{k}\) might change. First, changing the m-th labour productivity while keeping constant the incomes \({\varvec{z}}\), comes down to comparing households with different labour supplies but equal incomes. The first term on the right-hand side of (21) then tells us if households who supply different quantities of labour but earn equal disposable incomes consume different amounts of the good under consideration. This reason for differential commodity taxes was first identified by Christiansen (1984). A second reason why \(x^{k}\) might change as the m-th labour productivity changes is that there is a deterministic relation between the taxpayers’ consumption preferences and the labour productivity. This reason to differentially tax commodities was first identified by Mirrlees (1976).

To gain further intuition for the first term on the right-hand side of (21), let us consider an example. Suppose each household has two labour productivities \(w^{1}\) and \(w^{2}\), one for each spouse, and they earn labour incomes \(z^{1}\) and \(z^{2}\). They derive utility from consuming a numéraire good c, and they obtain disutility from supplying labour. Furthermore, households consume a good x. The utility function is:

$$\begin{aligned} U(c,x,{\varvec{z}},{\varvec{w}},\sigma )=c+\left( x^{\frac{\sigma -1}{\sigma }} -\left( \frac{z^{1}}{w^{1}}\right) ^{\frac{\sigma -1}{\sigma }}\right) ^{\frac{\sigma }{\sigma -1}}-v^{2}\left( \frac{z^{2}}{w^{2}}\right) , \end{aligned}$$

where \(\sigma\) indicates the complementarity between the consumption of x and the first labour supply, and the function \(v^{2}\) represents the disutility from the second labour supply. Assume that \(\sigma\) is independent from the labour productivities. Standard derivations yield the conditional demand for x:

$$\begin{aligned} X^{c}({\varvec{w}},\sigma ,{\varvec{z}})=\left( \frac{1-T_{z^{1}}}{1+t_{x}}\right) ^{\sigma }\frac{z^{1}}{\left( w^{1}\right) ^{1-\sigma }}, \end{aligned}$$

with partial derivatives:

$$\begin{aligned} \frac{\partial X^{c}}{\partial w^{1}}=\left( \sigma -1\right) \left( \frac{1-T_{z^{1}}}{1+t_{x}}\right) ^{\sigma }\frac{z^{1}}{\left( w^{1}\right) ^{2-\sigma }}\text { and }\frac{\partial X^{c}}{\partial w^{2}}=0. \end{aligned}$$

It follows that consumption of good x is complementary with the leisure of the first spouse if \(\sigma >1\), in which case theorem 1 implies that good x should be taxed at higher rates.

Let us now study the second term on the right-hand side of (21). The term \(\partial \ln g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta },{\varvec{z}})/\partial z^{l}\) differs from zero if the conditional density \(g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta }|{\varvec{z}})\) of the additional characteristics is different for different labour incomes \(z^{l}\). If this is the case, then the additional characteristics \(\varvec{\theta }\) contain information about the value of the income \(z^{l}\), even if no deterministic relation exists between them.

Assuming that the vector \(\varvec{\theta }\) is composed of taste parameters for the different consumption goods, the term \(\partial \ln g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta },{\varvec{z}})/\partial z^{l}\) will differ from zero if, on average around the income bundle under consideration, households with different labour incomes \(z^{l}\) have relatively different tastes. The values of \(\partial \ln g^{\varvec{\theta }|{\varvec{z}}}/\partial z^{l}\) represent the amount of information that the parameters \(\varvec{\theta }\) reveal about the households’ incomes.

Of course, the government cannot directly tax the unobservable parameters \(\varvec{\theta }\). However, if the consumption of commodity \(x^{k}\) is correlated with \(\partial \ln g^{\varvec{\theta }|{\varvec{z}}}/\partial z^{l}\), then also the consumption of this commodity reveals information about the value of labour income \(z^{l}\). It then becomes optimal for the government to levy a differential tax on commodity \(x^{k}\).

Keep in mind that the conditional densities \(g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta }|{\varvec{z}})\) must add up to one for all types who pool at income bundle \({\varvec{z}}\).Footnote 17 Therefore, if \(\partial \ln g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta },{\varvec{z}})/\partial z^{l}\) is positive for some values of \(\varvec{\theta }\), it must be negative for some others. If on average \(\partial \ln g^{\varvec{\theta }|{\varvec{z}}}(\varvec{\theta },{\varvec{z}})/\partial z^{l}\) is negative for households with low consumption of commodity \(x^{k}\) and positive for those with high consumption of commodity \(x^{k}\), then the covariance in (21) is positive. This implies that taxing that commodity serves as an indirect way to tax households with labour income \(z^{l}\).

To further clarify, consider the following example. Assume that households differ in three dimensions, their labour productivities \(w^{1}\) and \(w^{2}\), and their taste \(\gamma\) for the consumption of good x. Households obtain utility from numéraire good c and from good x, and disutility from obtaining labour incomes \(z^{1}\) and \(z^{2}\). They maximize a utility function:

$$\begin{aligned} U(c,x,w^{1},w^{2},\gamma ,z^{1},z^{2})=c+\gamma \frac{x^{1-\sigma }}{1-\sigma }-v^{1}\left( \frac{z^{1}}{w^{1}}\right) -v^{2}\left( \frac{z^{2}}{w^{2}}\right) , \end{aligned}$$

where the functions \(v^{1}\) and \(v^{2}\) represent the disutilities of labour supply. Standard derivations yield the demand for x conditional on the labour incomes:

$$\begin{aligned} x^{c}(w^{1},w^{2},\gamma ,z^{1},z^{2})=\left( \frac{\gamma }{1+t_{x}}\right) ^{1/\sigma }. \end{aligned}$$

If there is a deterministic relation between the taste parameters \(\gamma\) and either of the productivities \(w^{1}\) and \(w^{2}\), then the first term on the right-hand side of (21) will not be zero. Let us exclude that possibility. Given the separability of preferences between leisure and consumption, one might then think that good x should not be taxed. However, suppose that the taste parameter \(\gamma\) is positively correlated with the first labour income \(z^{1}\). For lower \(\gamma\) values, the density \(g(\gamma |z^{1},z^{2})\) will decrease in \(z^{1}\) for most \(z^{1}\) values, making \(\partial g(\gamma |z^{1},z^{2})/\partial z^{1}<0\). For higher \(\gamma\) values, the density \(g(\gamma |z^{1},z^{2})\) will increase in \(z^{1}\) for most \(z^{1}\) values, making \(\partial g(\gamma |z^{1},z^{2})/\partial z^{1}>0\). Since consumption of commodity x increases with the taste parameter \(\gamma\) conditional on the incomes, we find that conditional consumption of x is positively correlated with \(\partial g(\gamma |z^{1},z^{2})/\partial z^{1}\). This creates a reason for a positive tax on that commodity. This rationale for implementing differentiated commodity taxes was first uncovered by Saez (2002).

We have discussed two potential reasons why the second term on the right-hand side of (19) might not be zero. If there is a difference between the cross-sectional variation and the individual variation in the consumption of the good under consideration, then the consumption pattern reveals information about the individual types beyond what is revealed by the labour incomes alone. Saez (2002) shows that if this is the case, differentiated commodity taxes help mitigate the distortions caused by the tax on labour income. The second term on the right-hand side of (19) lends further support to this intuition: the optimal marginal excess burden resulting from taxing a specific good is proportional to the marginal excess burdens of taxing the labour incomes. Indeed, the larger are the marginal excess burdens from taxing the labour incomes, the larger are the potential benefits from shifting from labour incomes taxes to differentiated consumption taxes.

4.2.5 Additional terms

The third term on the right-hand side of (19) captures the possibility that conditional on the labour incomes, household for whom a higher labour income leads to a higher consumption of the good under consideration, has higher compensated responses to taxes on labour income. By shifting the tax burden from that labour income to the considered good, the government can achieve the same redistribution benefits with fewer efficiency losses.

The fourth term on the right-hand side of (19) contains a covariance between the income effects on government revenues and the consumption level of the considered good. This term is present because the social marginal utilities considered in the characterization of the optimum should be net of income effects.

Saez (2002) also obtains similar terms and notes that these third and fourth terms are difficult to measure empirically. These terms do not appear in models where households only differ in their labour productivities.

4.3 Relation to Saez (2002)

Saez (2002) identifies three conditions under which uniform commodity taxation is optimal. He starts from a situation where all commodity tax rates are zero and examines the welfare effects of perturbing one of the tax rates. If the welfare effects are zero, then deviating from uniformity is not optimal. In his investigations, Saez (2002) assumes that there is only one labour income.

I will now explore how the conditions found by Saez (2002) relate to my statement of the full optimality condition in theorem 1. His conditions are as follows, in the notations of the present paper:

  1. 1.

    conditional on each income level z, the marginal social utilities b, and the consumption of the considered good \(x^{j}\) are uncorrelated:

    $$\begin{aligned} \textrm{cov}(b,X^{j}|z)=0; \end{aligned}$$
  2. 2.

    conditional on each income level z, behavioural reponses \(\partial Z^{*}(\varvec{\omega })/\partial \tau\) and \(\partial Z(\varvec{\omega })/\partial \rho\) are independent of consumption patterns \(X^{j}\) and \(\partial X^{cj}/\partial z\)Footnote 18

    $$\begin{aligned} \textrm{cov}\left( \frac{\partial Z(\varvec{\omega })}{\partial \rho },X^{j}\Bigg |{\varvec{z}}\right) =0\text { }\textrm{and}\quad \textrm{cov}\left( \frac{\partial Z^{*}(\varvec{\omega })}{\partial \tau ^{l}},\frac{\partial X^{cj}}{\partial z}\Bigg |{\varvec{z}}\right) =0; \end{aligned}$$
  3. 2.

    for any income level, the cross-sectional variation in the consumption of the considered good is equal to the individual variation, as one varies labour income:

    $$\begin{aligned} \frac{\textrm{d}\overline{X^{j}}}{\textrm{d}z}-\frac{\overline{\partial X^{cj}}}{\partial z}=0. \end{aligned}$$

One difference between theorem 1 and the conditions found by Saez (2002) is the number of labour incomes. We see that the different terms in theorem 1 are conditional on the entire bundle of incomes \({\varvec{z}}\), and when we consider the effects of variations in one of the incomes \(z^{l}\), we sum those effects over all incomes.

Under the presumption that assumptions 1–3 found by Saez (2002) hold for all labour incomes, substitute these three assumptions into (19), and substitute definitions (10) and (12) for the marginal excess burdens and the marginal propensities to pay taxes out of additional income. What then remains is the following condition for the optimum:

$$\begin{aligned} \iint _{{\mathcal {Z}}}\sum _{k=1}^{J}t^{k}\overline{\frac{\partial X^{ck*}}{\partial t^{j}}}\textrm{d}G^{\textrm{z}}({\varvec{z}})=&-\sum _{j=1}^{J}t^{j}\sum _{l=1}^{L}\iint _{{\mathcal {Z}}}\textrm{cov}\left( \frac{\partial X^{j*}(\varvec{\omega })}{\partial \tau ^{l}},\frac{\partial X^{cj}}{\partial z^{l}}\Bigg |z\right) \textrm{d}G^{\textrm{z}}({\varvec{z}})\\&+\sum _{j=1}^{J}t^{j}\iint _{{\mathcal {Z}}}\textrm{cov}\left( \frac{\partial X^{j}(\varvec{\omega })}{\partial \rho },X^{j}\Bigg |{\varvec{z}}\right) )\textrm{d}G^{\textrm{z}}({\varvec{z}}). \end{aligned}$$

This is a linear equation in the commodity tax rates, with solution \({\varvec{t}}={\varvec{0}}\). We thus find that Saez’s (2002) assumptions remain sufficient for uniform taxation to be optimal in the case with multiple optimally taxed labour incomes.

5 Conclusions

I characterized the optimal linear taxation of commodities alongside multiple optimally taxed labour incomes, taking into account that individuals may differ in multiple unobserved characteristics. There are several possible reasons to deviate from uniform tax rates on commodities. A first possible reason is that conditional on labour income, taxpayers who consume more of a particular good are more deserving in the eyes of the government. This can be the case when a strong taste for a particular good is considered socially desirable or if the government wishes to compensate types who typically consume certain goods (for example, to compensate for other injustices). The acceptable size of the distortion from taxing a particular good, conditional on labour income, is then larger if the covariance conditional on income between the social marginal utilities and the consumption of that good is larger.

A second possible reason to differentiate commodity taxes is that on average and conditional on income, the cross-sectional variation of the consumption of some commodity differs from the individual variation as one varies one of the labour incomes. This can happen for two reasons: The first is that around the considered incomes, taxpayers with different incomes have different unobserved characteristics besides their labour productivities, and these characteristics correlate with their consumption patterns. The second reason why the individual variation may differ from the cross-sectional variation is that around the considered incomes, individuals who have the same disposable incomes but have supply different amounts of labour tend to consume different quantities of the good in question. In each case, shifting the burden of taxation from the labour incomes to differentiated commodity taxes increases the efficiency of the tax system. The acceptable size of the distortion of taxing a particular good then depends on the size of the difference between the cross-sectional and the individual variations of the consumption of that good along each labour income and the size of the marginal excess burdens of the taxes on those labour incomes.

The formulas in this paper identify which statistics are sufficient to characterize the optimal commodity taxes. Most of these statistics are not yet available in the empirical literature. Moreover, it remains unclear how the social welfare function can capture common ethical intuitions that may lead to differentiated commodity taxes in the optimum. For these reasons, numerical simulations of the optimal commodity tax rates would currently be based on arbitrary parameters. It would be very useful if further research could quantify the needed sufficient statistics for a numerical exercise. Further useful extensions would include moving beyond the unitary decision framework for the households. Such an extension, however, would first require advances in our knowledge on the optimal taxation of labour income when households do not take their decisions as one unit.