The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical Review

(1)

NBER WORKING PAPER SERIES

THE ELASTICITY OF TAXABLE INCOME WITH RESPECT TO MARGINAL TAX RATES:

A CRITICAL REVIEW

Emmanuel Saez

Joel B. Slemrod

Seth H. Giertz

Working Paper 15012

http://www.nber.org/papers/w15012

NATIONAL BUREAU OF ECONOMIC RESEARCH

1050 Massachusetts Avenue

Cambridge, MA 02138

May 2009

This paper was written for submission to the Journal of Economics Literature. We thank Soren Blomquist,

Raj Chetty, Henrik Kleven, Wojciech Kopczuk, Hakan Selin, Jonathan Shaw, editor Roger Gordon,

and anonymous referees for helpful comments and discussions, and Jonathan Adams and Caroline

Weber for invaluable research assistance. Financial support from NSF Grant SES-0134946 is gratefully

acknowledged. The views expressed herein are those of the author(s) and do not necessarily reflect

the views of the National Bureau of Economic Research.

NBER working papers are circulated for discussion and comment purposes. They have not been

peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official

NBER publications.

(2)

The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical Review

Emmanuel Saez, Joel B. Slemrod, and Seth H. Giertz

NBER Working Paper No. 15012

May 2009

JEL No. H24,H31

ABSTRACT

This paper critically surveys the large and growing literature estimating the elasticity of taxable income

with respect to marginal tax rates (ETI) using tax return data. First, we provide a theoretical framework

showing under what assumptions this elasticity can be used as a sufficient statistic for efficiency and

optimal tax analysis. We discuss what other parameters should be estimated when the elasticity is

not a sufficient statistic. Second, we discuss conceptually the key issues that arise in the empirical

estimation of the elasticity of taxable income using the example of the 1993 top individual income

tax rate increase in the United States to illustrate those issues. Third, we provide a critical discussion

of most of the taxable income elasticities studies to date, both in the United States and abroad, in light

of the theoretical and empirical framework we laid out. Finally, we discuss avenues for future research.

Emmanuel Saez

Department of Economics

University of California, Berkeley

549 Evans Hall #3880

Berkeley, CA 94720

and NBER

saez@econ.berkeley.edu

Joel B. Slemrod

University of Michigan Business School

Room R5396

Ann Arbor, MI 48109-1234

and NBER

jslemrod@umich.edu

Seth H. Giertz

University of Nebraska

Dept. of Economics, CBA 368

P.O. Box 880489

Lincoln, NE 68588-0489

sgiertz2@unl.edu

(3)

1 Introduction

The notion of a behavioral elasticity occupies a critical place in the economic analysis of taxation. Graduate textbooks teach that the two central aspects of the public sector, optimal progressivity of the tax-and-transfer system, as well as the optimal size of the public sector, depend (inversely) on the compensated elasticity of labor supply with respect to the marginal tax rate. Indeed, until recently, the closest thing we have had to a central parameter was the labor supply elasticity. In a static model where people value only two commodities – leisure and a composite consumption good – the real wage in terms of the consumption good is the only relative price at issue. This real wage is equal to the amount of goods that can be consumed per hour of leisure foregone (or, equivalently, per hour of labor supplied). At the margin, substitution possibilities, and therefore the excess burden of taxation, can be captured by a compensated labor supply elasticity.

With some exceptions, the profession has settled on a value for this elasticity close to zero for prime-age males, although for married women the responsiveness of labor force participation appears to be significant. Overall, though, the compensated elasticity of labor appears to be fairly small. In models with only a labor-leisure choice, this implies that the efficiency cost of taxing labor income – to redistribute revenue to others or to provide public goods – is bound to be low, as well.

Although evidence of a substantial compensated labor supply elasticity has been hard to find, evidence that taxpayers respond to tax system changes more generally has decidedly not been hard to find. For example, there is compelling evidence in the U.S. the timing of capital gains realizations reacts strongly to changes in capital gains tax rates. There was a surge in capital gains realizations in 1986, after the U.S. government passed the Tax Reform Act of 1986 which increased tax rates on realizations in 1987 and after (Auerbach, 1988). Dropping the top individual rax rate to below the corporate tax rate in the same Act led to a significant increase in business activity carried out in pass-through, non-corporate form (Auerbach and Slemrod, 1997).

Addressing these other margins of behavioral response is crucial because under some as-sumptionsall responses to taxation are symptomatic of deadweight loss. Taxes trigger a host of behavioral responses designed to minimize the burden on the individual. In the absence of externalities or other market failures, and putting aside income effects, all such responses are sources of inefficiency, whether they take the form of reduced labor supply, increased charita-ble contributions, increased expenditures for tax professionals, or a different form of business organization, and thus they add to the burden of taxes from society’s perspective. Because in principle the elasticity of taxable income (which we abbreviate from now on using the

(4)

stan-dard acronym ETI) can capture all of these responses, it holds the promise of more accurately summarizing the marginal efficiency cost of taxation than a narrower measure of taxpayer response such as the labor supply elasticity, and therefore is a worthy topic of investigation.

The new focus raises the possibility that the efficiency cost of taxation is significantly higher than is implied if labor supply is the sole, or principal, margin of behavioral response. Indeed, some of the first empirical estimates of the elasticity of taxable income implied very sizeable responses and therefore a very high marginal efficiency cost of funds. The subsequent literature has found somewhat smaller elasticities, and raised questions about both our ability to identify this key parameter and about the claim that it is a sufficient statistic for doing welfare analysis. Whether the taxable income elasticity is an accurate indicator of the revenue leakage due to behavioral response, the ultimate indicator of efficiency cost absent classical externalities, depends on the situation. For example, if revenue leakage in current year tax revenue is substantially offset by revenue gain in other years or in other tax bases, it is misleading. Secondly, if some of the response involves changes in activities with externalities, then the elasticity is not a sufficient statistic for welfare analysis.

The remainder of the paper is organized as follows. Section 2 presents the theoretical framework underlying the taxable income elasticity concept. Section 3 presents the key identi-fication issues that arise in the empirical estimation of the taxable income elasticity, using the example of the 1993 top tax rate increase in the United States to illustrate those issues. Section 4 reviews empirical studies in light of our conceptual and empirical identification frameworks. Section 5 concludes and discusses the most promising avenues for future research. In appendix A we present a summary of the key U.S. legislated tax changes that have been used in the U.S. literature and in appendix B a brief description of existing U.S. tax return data.

2 Conceptual Framework

2.1 Basic Model

In the standard labor supply model, individuals maximize a utility function u(c, l) wherec is disposable income, equal to consumption in a one-period model, andlis labor supply measured by hours of work. Earnings are given by z = wl, where w is the exogenous wage rate. The (linearized) budget constraint is c=wl(1−τ) +E where τ is the marginal tax rate andE is virtual income.

The taxable income elasticity literature generalizes this model by noting that hours of work are only one component of the behavioral response to income taxation. Individuals can respond to taxation through other margins such as intensity of work, career choices, form and timing of compensation, tax avoidance, or tax evasion. As a result, the individual’s wage rate wmight

(5)

depend on effort and respond to tax rates, and reported taxable income might also differ from

wlas individuals might split their gross earnings between taxable cash compensation and non-taxable compensation such as fringe benefits, or even fail to report their full non-taxable income because of tax evasion.

As shown by Feldstein (1999), a simple, reduced-form way to model all those behavioral responses is to posit that utility depends positively on disposable income (equal to consump-tion) cand negatively on reported income z (because activities generating income are costly, for example they may require foregoing leisure). Hence, individuals choose (c, z) to maximize a utility function u(c, z) subject to a budget constraint of the form c = (1−τ)z+E. Such maximization generates an individual “reported income” supply function z(1−τ, E) where

z depends on the net-of-tax rate 1−τ and virtual income E generated by the tax/transfer system.1 _{Each individual has a particular reported income supply function reflecting his/her} skills, taste for labor, opportunities for avoidance, etc.2

In most of what follows, we assume away income effects so that the income function z

does not depend onE and depends only on the net-of-tax rate.3 In the absence of compelling evidence about significant income effects in the case of overall reported income, it seems reason-able to consider the case with no income effects, which simplifies considerably the presentation of efficiency effects. It might seem unintuitive to assume away the effect of changes in exoge-nous income on (reported taxable) income. However, in the reported income context, E is defined exclusively as virtual income created by the tax/transfer budget constraint and hence is not part of taxable income z. Another difference is that the labor component of z is labor income (wl) rather than labor hours (l); this difference requires us to address the incidence of tax rate changes (i.e., their effect onw), which we do in briefly in Section 2.2.5.

The literature on behavioral responses to taxation has attempted to estimate the elasticity of reported incomes with respect to the net-of-tax rate, defined as

e= 1−τ

z

∂z

∂(1−τ), (1)

the percent change in reported income when the net-of-tax rate increases by 1%. With no 1

This reported income supply function remains valid in the case of non-linear tax schedules asc= (1−τ)z+E is the linearized budget constraint at the utility-maximizing point, just as in the basic labor supply model.

2

We could have posited a more general model in which c = y−τ z+E, where y is real income while z is reported income that may differ from real income because of tax evasion and avoidance. Utility would be u(c, y, y−z) increasing inc, decreasing iny(earnings effort), and decreasing iny−z(costs of avoiding or evading taxes). Such a utility function would still generate a reported income supply function of the formz(1−τ, E) and our analysis would go through. We come back to such a more general model in Section 2.3.2.

3

In general, labor supply studies estimate modest income effects (see Blundell and MaCurdy, 1999 for a survey). There is much less empirical evidence on the magnitude of income effects in the reported income literature. Gruber and Saez (2002) estimate both income and substitution effects in the case of reported incomes, and find small and insignificant income effects.

(6)

income effects, this elasticity is equal to both the compensated and uncompensated elasticity. Critically and as shown in Feldstein (1999), this elasticity captures not only the hours of work response but also all other margins of behavioral response to marginal tax rates.

As we discuss later, a number of empirical studies have found that the behavioral response to changes in marginal tax rates is concentrated in the top of the income distribution, with less evidence of any response for the middle and upper-middle income class (see Sections 3 and 4 below).4 _{In the United States, because of exemptions and tax credits, individual income tax} liabilities are very skewed: the top quintile (top percentile) tax filers remitted 86.3% (39.1%) of all individual income taxes in 2006 (Congressional Budget Office, 2009). Therefore, it is useful to focus on the analysis of the effects of increasing the marginal tax rate on the upper end of the income distribution. Let us therefore assume that incomes in the top bracket, above a given reported income threshold ¯z, face a constant marginal tax rate τ.5 _{We denote by} _N the number of taxpayers in the top bracket.

As in the conceptual framework just described, we assume that individual incomes reported in the top bracket depend on the net-of-tax rate 1−τ, and we denote byzm(1−τ) theaverage income reported by taxpayers in the top bracket, as a function of the net-of-tax rate. The aggregate elasticity of income in the top bracket with respect to the net-of-tax rate is therefore defined as e = [(1−τ)/zm]∂zm/∂(1−τ). This aggregate elasticity is equal to the average of the individual elasticities weighted by individual income, so that individuals contribute to the aggregate elasticity in proportion to their incomes.6 Thus, in order to estimate e, most empirical analyses (as we will see below) weight individuals by their income.

Suppose that the government increases the top tax rateτ by a small amount dτ (with no change in the tax schedule for incomes below ¯z). This small tax reform has two effects on tax revenue. First, there is a “mechanical” increase in tax revenue due to the fact that taxpayers face a higher tax rate on their incomes above ¯z. The total mechanical effect is

dM ≡N[zm−z¯]dτ >0. (2)

This mechanical effect is the projected increase in tax revenue, absent any behavioral response. Second, the increase in the tax rate triggers a behavioral response that reduces the average reported income in the top bracket by dz = −e·zm·dτ /(1−τ). A change dz changes tax

4

The behavioral response at the low end of the income distribution is for the most part out of the scope of the present paper. The large literature on responses to welfare and income transfer programs targeted toward low incomes has, however, displayed evidence of significant labor supply responses (see, e.g., Meyer and Rosenbaum, 2001, for a recent analysis).

5

For example, in the case of tax year 2008 federal income tax law in the United States, taxable incomes of married couples filing jointly that are above ¯z= $357,700 are taxed at the top marginal tax rate ofτ = 0.35.

6

Formally, zm= [z1+..+zN]/N and hence e= [(1−τ)/zm]∂zm/∂(1−τ) = (1−τ)[∂z1/∂(1−τ) +..+

(7)

revenue byτ dz. Hence, the aggregate change in tax revenue due to the behavioral response is equal to

dB ≡ −N·e·zm· τ

1−τdτ <0. (3)

Summing the mechanical and the behavioral effect, we obtain the total change in tax revenue due to the tax change:

dR=dM +dB =N dτ(zm−z¯)· 1−e· z m zm₋_z_¯· τ 1−τ . (4)

Let us denote by athe ratio zm/(zm−z¯). Note that in general a≥1, and that a= 1 when a single flat tax rate applies to all incomes, so that the top bracket starts at zero, that is, when ¯z = 0. If the top tail of the distribution is Pareto distributed,7 then the parameter a

does not vary with ¯z and is exactly equal to the Pareto parameter. As the tails of actual income distributions are very well approximated by Pareto distributions, the coefficient a is extremely stable in the United States for ¯z above $300,000 and equals approximately 1.6 in recent years.8 The parameter a measures the thinness of the top tail of the income the distribution: the thicker the tail of the distribution, the larger is zm relative to ¯z, and hence the smaller isa.

Using the definition ofa, we can rewrite the effect of the small reform on tax revenuedR

simply as: dR=dM 1− τ 1−τ ·e·a . (5)

Formula (5) shows that the fraction of tax revenue lost through behavioral responses–the second term in the square bracket expression–is a simple function increasing in the tax rate

τ, the elasticity e, and the Pareto parameter a. This expression is of primary importance to the welfare analysis of taxation because it is exactly equal to the marginal deadweight burden created by the increase in the tax rate, under the assumptions we have made and that we discuss below. This can be seen as follows: Because of the envelope theorem, the behavioral response to a small tax change dτ creates no additional welfare loss and thus the utility loss (measured in dollar terms) created by the tax increase is exactly equal to the mechanical effect

dM.9 However, tax revenue collected is only dR =dM +dB < dM because dB < 0. Thus 7

A Pareto distribution has a density function of the form f(z) = C/z1+α, where C and α are constant parameters. The parameterα is called the Pareto parameter. In that case zm ₌R∞

¯ z zf(z)dz/ R∞ ¯ z f(z)dz = ¯ z·α/(α−1) and hencezm/(zm−¯z) =α.

8_{Saez (2001) provides such an empirical analysis for 1992 and 1993 reported incomes using U.S. tax return} data. Piketty and Saez (2003) provide estimates of thresholds ¯zand average incomeszmcorresponding to various fractiles within the top decile of the U.S. income distribution from 1913 to 2006, allowing a straightforward estimation of the parameterafor any year and income threshold.

9

Formally,V(1−τ, E) = maxzu(z(1−τ) +E, z) so thatdV =uc·(−zdτ+dE) =−uc·(z−z¯)dτ. Therefore,

the (money-metric) utility cost of the reform is indeed equal to the mechanical tax increase, individual by individual.

(8)

−dB represents the extra amount lost in utility over and above the tax revenue collecteddR. From (5) and dR=dM +dB, the marginal excess burden expressed in terms of extra taxes collected is defined as

−dB

dR =

e·a·τ

1−τ−e·a·τ. (6)

In other words, for each extra dollar of taxes raised, the government imposes an extra cost equal to−dB/dR >0 on taxpayers. We can also define the “marginal efficiency cost of funds” (MECF) as 1−dB/dR = (1−τ)/(1−τ −e·a·τ). Those formulas are valid for any tax rate τ and income distribution as long as income effects are assumed away, even if individuals have heterogeneous utility functions and behavioral elasticities.10 The parameters τ and a

are relatively straightforward to measure, so that the elasticity parameter e is the central parameter necessary to calculate formulas (5) and (6).

To illustrate these formulas consider the following example. In 2006, for the top 1% income cut-off (corresponding approximately to the top 35% federal income tax bracket in that year), Piketty and Saez (2003) estimate that a = 1.60. For an elasticity estimate of

e= 0.5 (corresponding, as we discuss later, to the mid to upper range of the estimates from the literature), the fraction of tax revenue lost through behavioral responses (−dB/dM), should the top tax rate be slightly increased, would be 43.1%, slightly below half of the mechanical (i.e., ignoring behavioral responses) projected increase in tax revenue.11 In terms of marginal excess burden, increasing tax revenue bydR= $1 causes a utility loss (equal to the MECF) of 1/(1−.431) = $1.76 for taxpayers, and hence a marginal excess burden of −dB/dR= $0.76, or 76% of the extra $1 tax collected.

Following the supply-side debates of the early 1980s, much attention has been focused on the revenue-maximizing tax rate. The revenue maximizing tax rate τ∗ is such that the bracketed expression in equation (5) is exactly zero when τ =τ∗. Rearranging this equation, we obtain the following simple formula for the tax revenue maximizing rate τ∗ for the top bracket:

τ∗ = 1

1 +a·e. (7)

A top tax rate aboveτ∗ is inefficient because decreasing the tax rate would both increase the utility of the affected taxpayers with income above ¯zand increase government revenue, which can in principle be used to benefit other taxpayers.12 _{At the tax rate} _τ∗_{, the marginal excess} 10_{In contrast, the Harberger triangle (Harberger, 1964) approximations are only valid for small tax rates. This} expression also abstracts from any marginal compliance costs caused by raising rates, and from any marginal administrative costs unlessdRis interpreted as revenue net of administrative costs. See Slemrod and Yitzhaki (2002).

11

The fraction would be around 50% if we included average state income tax rates and health insurance payroll taxes in the estimate ofτ.

12

(9)

burden becomes infinite as raising more tax revenue becomes impossible. Using our previous example with e= 0.5 anda = 1.6, the revenue-maximizing tax rate τ∗ would be 55.6%, not much higher than the combined maximum federal, state, Medicare, and typical sales tax rate in the United States of 2008.

Note that when the tax system has a single tax rate (i.e., when ¯z = 0), the tax revenue maximizing rate becomes the well-known expression τ∗ = 1/(1 +e). As a ≥1, the flat-rate revenue-maximizing rate is always larger than the revenue-maximizing rate for high incomes only. This is because increasing just the top tax rate collects extra taxes only on the portion of incomes above the bracket threshold ¯z, but produces a behavioral response for high-income taxpayers as large as an across-the-board increase in marginal tax rates.

Giertz (2009) applies the formulas presented in this section to tax return data from pub-lished Statistics of Income (SOI) tables produced by the Internal Revenue Service (IRS) in order to analyze the impact of the potential expiration of the Bush tax cuts. Giertz shows that exactly where the ETI falls within the range found in the literature has significant effects on the efficiency and revenue implications for tax policy. For example, Giertz reports that for ETIs of 0.2, 0.5 and 1.0, behavioral responses would respectively erase 12, 31 and 62% of the mechanical revenue gain. When offsets to payroll and state income taxes are taken into account, these numbers increase by 28%. Likewise, estimates for the marginal cost of public funds (MCF) and the revenue-maximizing rates are quite sensitive to this range of ETIs.

In the basic model we have considered, the ETI e is a sufficient statistic to estimate the efficiency costs of taxation as it is not necessary to estimate the structure parameters of the underlying individual preferences. Using such sufficient statistics for welfare and normative analysis has been used various contexts in the field of public economics in recent years (see Chetty, 2008b for a recent survey). However, it is important to understand the limitations of this approach and the strong assumptions required to apply it, as we show in the next sub-sections.

2.2 Fiscal Externalities and Income Shifting

The analysis has assumed so far that the reduction in incomes due to the tax rate increase has no other effect on tax revenue. This is a reasonable assumption if the reduction in incomes is due to reduced labor supply (and hence an increase in untaxed leisure time), or due to a shift from taxable cash compensation toward untaxed fringe benefits or perquisites (more generous

produce a Pareto improvement, ignoring the possibility that the utility of some individuals enters negatively in the utility functions of others. The optimal income taxation literature following Mirrlees (1971) shows that formula (7) is the optimal top tax rate if the social marginal utility of consumption decreases to zero when income is large (see Saez, 2001).

(10)

health insurance, better offices, company cars, etc.) or tax evasion. However, in many instances the reduction in reported incomes is due in part to a shift away from individual income toward other forms of taxable income such as corporate income, or deferred compensation that will be taxable to the individual at a later date (see Slemrod, 1998). For example, as discussed in more detail later, Slemrod (1996) and Gordon and Slemrod (2000) show convincingly that part of the surge in top individual incomes after the Tax Reform Act of 1986 in the United States, which reduced individual income tax rates relative to corporate tax rates (see appendix A), was due to a shift of taxable income from the corporate sector toward the individual sector.

For a tax change in a given basez, we define a fiscal externality as a change in tax revenue that occurs in any tax base z0 other than z due to the behavioral response of private agents to the tax change in the initial base z. The alternative tax base z0 can be a different tax base in the same time period or the same tax base in a different time period. The notion of fiscal externality is therefore dependent on the scope of the analysis both along the base dimension and the time dimension. In the limit where the analysis encompasses all tax bases and all time periods (and hence focuses on the total present discounted value of tax revenue), there can be by definition no fiscal externalities.

To see the implication of income shifting, assume that a fractions <1 of the incomes that disappear from the individual income tax base following the tax rate increase dτ are shifted to other bases and are taxed on average at rate t(< τ). For example, if two-thirds of the reduction in individual reported incomes is due to increased (untaxed) leisure and one-third is due to a shift toward the corporate sector, thens= 1/3 and twould be equal to the effective tax rate on corporate income. Therefore, a behavioral response dz generates a tax revenue change equal to (τ −t·s)dz. As a result, the change in tax revenue due to the behavioral response becomes:

dB =−N·e·zm· τ

1−τdτ+N ·e·z

m_· t·s

1−τdτ. (8)

Therefore, formula (5) for the effect of the small reform on total tax revenue becomes:

dR=dM +dB=dM 1−τ −s·t 1−τ ·e·a . (9)

The same envelope theorem logic applies for welfare analysis: the income that is shifted to another tax base at the margin does not generate any direct change in welfare because the tax filer is indifferent between reporting marginal income in the individual income tax base vs. the alternative tax base. Therefore, as above, −dB represents the marginal deadweight burden of the individual income tax, and the marginal excess burden expressed in terms of extra taxes

(11)

collected can be written as

−dB

dR =

e·a·(τ −s·t)

1−τ−e·a·(τ −s·t). (10) The revenue-maximizing tax rate (7) becomes:

τ_s∗ = 1 +s·t·a·e 1 +a·e > τ

∗_. ₍₁₁₎

If we assume again that a = 1.6, e = .5, τ = 0.35 but that half (s = 0.5) of marginal income disappearing from the individual base is taxed on average at t = 0.3,13 the fraction of revenue lost due to behavioral responses drops from 43% to 25%, and the marginal excess burden (expressed as a percentage of extra taxes raised) decreases from 76% to 32%. The revenue-maximizing tax rate increases from 55.6% to 62.2%.

This simple theoretical analysis shows therefore that, in addition to estimating the elasticity

e, it is critical to analyze whether the source or destination of changes in reported individual incomes is another tax base. Such an alternative tax base can be a concurrent one or in another time period, as we discuss below. Therefore, two additional parameters, in addition to the taxable income elasticity e, are crucial in the estimation of the tax revenue effects and marginal deadweight burden: (1) The extent to which individual income changes in the first tax basez shift to another form of income that is taxable, characterized by parameter s, and (2) The tax ratetat which the income shifted is taxed. In practice, there are many possibilities for such shifting and measuring empirically all the shifting effects is challenging, especially in the case of shifting across time. The recent literature has addressed several channels for such fiscal externalities.

2.2.1 Individual vs. Corporate Income Tax Base

Most countries tax corporate profits with a separate corporate income tax.14 _{Unincorporated} business profits (such as sole proprietorships or partnerships) are in general taxed directly at the individual level. In the United States, closely held corporations with few shareholders (less than 100 currently) can elect to become Subchapter S corporations and be taxed solely at 13_{We show below that} _s _{= 0}_._{5 and} _t _{= 0}_._{3 are realistic numbers to capture the shift from corporate to} individual taxable income following the Tax Reform Act of 1986.

14_{Net-of-tax corporate profits are taxed again at the individual level when paid out as dividends to individual} shareholders. Many OECD countries alleviate such double taxation of corporate profits by providing tax credits or preferential tax treatment for dividends. If profits are retained in the corporation, they increase the value of the company stock and those profits may, as in the United States, be taxed as realized capital gains when the individual owners eventually sell the stock. In general, the individual level of taxation of corporate profits is lower than the ordinary individual tax on unincorporated businesses so that the combined tax on corporate profits and distributed profits may be lower than the direct individual tax for individuals subject to high marginal individual tax rates.

(12)

the individual level. Therefore, the choice of business organization (regular corporation taxed by the corporate income tax vs. business entity taxed solely at the individual level) might respond to the relative tax rates on corporate vs. individual income.

For example, if the individual income tax rate increases, some businesses taxed at the individual level may choose to incorporate where they would be subjected to the corporate income tax instead.15 In that case, the standard taxable income elasticity might be large and the individual income tax revenue consequences significant. However, corporate income tax revenue will increase and partially offset the loss in revenue on the individual side. It is possible to provide a micro-founded model capturing those effects.16 Because there are heterogeneous fixed costs of switching business form organization, in the aggregate the shifting response to tax rates is smooth, and marginal welfare analysis is still applicable. As a result, the reduced form formula (9) is a sufficient statistic to derive the welfare costs of taxation in that case. Estimating s and t empirically would require knowing the imputed corporate profits of individual shareholders.

This issue was quite significant for analyses of TRA 86 because of the sharp decline (and change in sign!) in the difference between the top personal and corporate tax rates, which created an incentive to shift business income from the corporation tax base to pass-through entities such as partnerships or Subchapter S corporations, so that the business income shows up in the individual income tax base (see appendix A for a description of the TRA 86 changes). This phenomenon was indeed widespread immediately after TRA 86 (documented by Slemrod, 1996, Carroll and Joulfaian, 1997, and Saez, 2004 among others).

2.2.2 Short-term vs. Long-term Responses

If individuals anticipate that a tax increase will happen soon,17 _{they have incentives to} accel-erate taxable income realizations before the tax change takes place.18 As a result, reported taxable income just after the reform will be temporarily depressed. In that case, the tax in-crease has a positive fiscal externality on the pre-reform period which ought to be taken into account in a welfare analysis.

As we will see below, this issue of re-timing is particularly important in the case of realized 15_{Again, to the extent that dividends and capital gains are taxed, shareholders would not entirely escape the} individual income tax.

16_{Alvaredo and Saez (2008) develop such a model in the case of the Spanish wealth tax, under which stock} in closely held companies is excluded from the wealth tax for individuals who own at least 15% of the business and are substantially involved in management.

17_{For example, President Clinton was elected in late 1992 on a program to raise top individual tax rates,} which was indeed implemented in 1993.

(13)

capital gains19 and stock-option exercises (Goolsbee 2000a). Parcell (1995), Feldstein and Feenberg (1996), as well as Sammartino and Weiner (2007) document the large shift of taxable income into 1992 from 1993 in response to the tax increase on high-income earners promised by President-elect Bill Clinton, and enacted in early 1993.

Conversely, adjusting to a tax change might take time (as individuals might decide to change their career or educational choices or businesses might change their long-term invest-ment decisions), creating a negative fiscal externality in future years. In that case, the short-term response elasticity would underestimate the welfare cost of taxation. Therefore, in both cases, it is preferable to estimate the long-term response of tax changes although, as we discuss below, the long-term response is more difficult to identify empirically. The empirical literature has primarily focused on short-term (1 year) and medium-term (up to 5 year) responses, and is not able to convincingly identify very long-term responses.

The labor supply literature started with a static framework and then developed a dynamic framework to distinguish between responses to temporary changes versus permanent changes in wage rates (MaCurdy, 1981). Although the ETI literature has not explicitly developed such a framework, the same theoretical issues of responses to temporary versus permanent tax changes arise. Because of inter-temporal substitution, and barring adjustment costs, responses to temporary changes will be larger than responses to permanent changes.20 This is an important issue to keep in mind when discussing empirical studies.

The issue of long-term responses is particularly important in the case of capital income, as capital income is the consequence of past savings and investment decisions. For example, a higher top income tax rate might discourage wealth accumulation or dissipate existing fortunes faster. The new long-term wealth distribution equilibrium might not be reached for decades or even generations, which makes it particularly difficult to estimate the long-run elasticity. Estimating the effects on capital would require developing a dynamic model of tax responses, which has not yet been developed in the context of the ETI literature.

2.2.3 Current vs. Deferred Income

If current income tax rates increase but long-term future expected income tax rates do not, individuals might decide to defer some of their incomes, for example, in the form of future

19

The most famous example is the U.S. Tax Reform Act of 1986, which increased the top tax rate on realized capital gains from 20% to 28%, and generated a surge in capital gains realizations at the end of 1986 (Auerbach, 1988; Burman, Clausing, and O’Hare, 1994).

20

In the labor supply literature, responses to temporary changes are captured by the Frisch elasticity which higher than the compensated elasticity to permanent changes.

(14)

pension payments21 (deferred compensation) or future realized capital gains.22 In that case, a current tax increase might have a positive fiscal externality in future years; such a fiscal externality affects the welfare cost of taxation as we described above. A similar issue applies whenever a change in tax rates affects business investment decisions undertaken by individuals. If, for example, a lower tax rate induces sole proprietors or principals in pass-through entities to expand investment, the short-term effect on taxable income may be negative, reflecting the deductible net expenses in the early years of an investment project.

2.2.4 Tax Evasion

Suppose that a tax increase leads to a higher level of tax evasion. In that case, there might be increases in taxes collected on evading taxpayers following audits. This increased audit-generated tax revenue is another form of a positive fiscal externality. In practice, most em-pirical studies are carried out using tax return data before audits take place, and therefore do not fully capture the revenue consequences. Chetty (2008) makes this point formally and shows that, under risk neutrality assumptions, at the margin an individual is indifferent be-tween evading one dollar more and facing a marginally higher audit rate, and therefore the tax revenue lost due to increased tax evasion is exactly recouped (in expectation) by increased fines collected by the government. As a result, in that case, the elasticity that matters for deadweight burden is not the elasticity of reported income but instead the elasticity of real income.

2.2.5 Other Fiscal Externalities

Changes in reported incomes might also have consequences for bases other than federal income taxes. An obvious example is the case of state income taxes in the United States. If formula (6) is applied to the federal income tax only, it will not capture the (negative) externality on state income tax revenue (as states use virtually the same tax base as the federal government). In that case (ignoring the deductibility of state income taxes for federal tax purposes), we have

s= −1 and t is the state income tax rate. Put another way, our original analysis should be based on the total federal plus state income tax rate τ+t.

Changes in reported individual income due to real changes in economic behavior (such as reduced labor supply) can also have consequences for consumption taxes (if, for example, less

21

In the United States, individual workers can electively set aside a fraction of their earnings into pension plans (traditional IRAs and 401(k)s) or employers can provide increased retirement contributions at the expense of current compensation. In both cases, those pension contributions are taxed when the money is withdrawn as pension income.

22

For example, companies, on behalf of their shareholders, may decide to reduce dividend payments which are taxed now and retain earnings in order to generate capital gains that are taxed later when the stock is sold.

(15)

labor income is accompanied by less consumption). In particular, a broad-based value added tax is economically equivalent to an income tax (with expensing) and therefore should also be included in the tax rate used for welfare computations.

Finally, fiscal externalities may also arise due to classical general equilibrium incidence ef-fects. For example, a reduced tax rate on high incomes might stimulate labor supply of workers in highly paid occupations, and hence could decrease their pre-tax wage rate while increasing the pre-tax wage rates of lower-paid occupations through general equilibrium effects.23 _Such incidence effects are effectively transfers from some factors of production (high-skilled labor in our example) to other factors of production (low-skilled labor). If different factors are taxed at different rates (due for example of a progressive income tax), then those incidence effects will have fiscal consequences. Conceptually, however, as those incidence effects are transfers, the government can always readjust tax rates on each factor in order to undo those incidence effects at no fiscal cost. Therefore, in a standard competitive model, incidence effects do not matter for the efficiency analysis nor for optimal tax design.24

2.3 Other Issues

2.3.1 Classical Externalities

There are situations where individual responses to taxation may involve classical externalities. Two prominent examples are charitable giving and mortgage interest payments for residential housing, which in the United States and some other countries are deductible from taxable income, a tax treatment which is often justified on the grounds of classical externalities. Con-tributions to charitable causes create positive externalities when the conCon-tributions increase the utility of the beneficiaries of the nonprofit organizations. To the extent that mortgage interest deductions increase home ownership, they arguably create positive externalities in neighbor-hoods (although the level or even the existence of such a net home ownership externality is debated, see e.g., Glaeser and Shapiro, 2003). Expenditures on such deductible items may rise following a tax increase because their net price is equal to the net-of-tax rate 1−τ when deductions are itemized.25 Increased expenditures on these items will decrease taxable income. Suppose a fraction s of the taxable income response to a tax rate increase dτ is due to 23_{Such effects are extremely difficult to convincingly estimate empirically. Kubik (2004) attempts such an} analysis and finds that, controlling for occupation-specific time trends in wage rates, individuals in occupations that experienced large decreases in their median marginal tax rates due to TRA86 received lower pre-tax wages after 1986 as the number of workers and the hours worked in these professions increased.

24

Indeed, Diamond and Mirrlees (1971) showed that optimal tax formulas are the same in a model with fixed prices of factors (with no incidence effects) and in a model with variable prices (with incidence effects).

25

There is a large empirical literature finding significant responses of charitable giving to individual marginal income tax rates. See, for example, Auten, Sieg and Clotfelter (2002), Clotfelter and Schmalbeck (1996), Randolph (1995) and Karlan and List (2007).

(16)

higher expenditures on activities such as charitable giving that create an externality with a social marginal value of t dollars, per dollar of additional expenditure. In that case, formula (9) applies by just substituting the alternative tax base rate t by the social marginal value of the externality. For example, in the extreme case where all the taxable income response comes from tax expenditures (s= 1) with income before tax expenditures being unresponsive to tax rates, and if t=τ (the social marginal value of tax expenditures externalities is equal to the income tax rate τ) then there is zero marginal excess burden from taxation. (It is a Pigouvian tax.) 26 More generally, to the extent that the behavioral response to higher tax rates generates positive externalities, formula (4) will overstate the marginal efficiency cost of taxation.

Because the bulk of items that are deductible from taxable income in the United States – state and local income taxes, mortgage interest deductions, and charitable giving – may generate fiscal or classical externalities, the elasticity of a broader, pre-deduction, concept of income (such as adjusted gross income in the United States) is of interest in addition to a taxable income elasticity. That is why much conceptual and empirical analyses focus on adjusted gross income – which is not net of such deductible items – rather than taxable income.

Classical externalities might also arise in agency models where executives set their own pay by expending efforts to influence the board of directors.27 _{It is conceivable that such} pay-setting efforts depend on the level of the top income tax rate and would increase following a top tax rate cut. In such a case, top executive compensation increases come at the expense of shareholders returns which produces a negative externality.28 Such an externality would reduce the efficiency costs of taxation (as correcting the externality precisely requires a positive tax in that case).

2.3.2 Changes in the Tax Base Definition and Tax Erosion

As pointed out by Slemrod (1995) and Slemrod and Kopczuk (2002), how broadly the tax base is defined affects the taxable income elasticity. For example, in general the more tax deductions that are allowed, the higher will be the taxable income elasticity. This implies that the final taxable income elasticity depends not only on individual preferences (as we posited

26

Saez (2004b) develops a simple optimal tax model to capture those effects. 27

Under perfect information and competition, executives would not be able to set their pay at a different level from their marginal product. In reality, the marginal product of top executives cannot be perfectly observed, which creates scope for influencing pay, as discussed extensively in Bebchuk and Fried (2004).

28_{Such externalities would fit into the framework developed by Chetty (2008). Following the analysis of} Chetty and Saez (2007), such agency models produce an externality only if the pay contract is not second-best Pareto efficient, e.g., it is set by executives and large shareholders on the board without taking into account the best interests of small shareholders outside the board.

(17)

in our basic model in Section 2.1) but also on the tax structure. Therefore, the tax base choice determines in part the taxable income elasticity, and hence the latter can be thought of as a policy choice. The same logic applies to the enforcement of a given tax base, which can particularly affect the behavioral response of avoidance schemes and evasion.

This point is paramount for policy analysis. Suppose that we estimate a large taxable income elasticity because the tax base is set such that there are loopholes making it easy to shelter income from tax (we discuss in detail such examples using U.S. tax reforms below). In the narrow model of Section 2.1, the policy prescription is to have a lower tax rate. However, in a broader context, a much better policy may be to eliminate loopholes in order to reduce the taxable income elasticity and the deadweight burden of taxation. 29

Let us consider a simple example to illustrate this point. As in our basic model, individuals supply effort to earn income z. Suppose individuals can, at some cost, shelter part of their incomezinto another form that might receive preferable tax treatment. Let us denotew+y=

z, wherey is sheltered income and w is unsheltered income. Formally, individuals maximize a utility function of the formu(c, z, y) that is decreasing inz (earning income requires effort) and y (sheltering income is costly). Suppose we start from a comprehensive tax base wherez

is taxed at rate τ so that c = (1−τ)z+E (E denotes a lump-sum transfer). In that case, sheltering income is costly and provides no tax benefit so that individuals choose y = 0 and the analysis is the same as in Section 3.1 where the relevant elasticity is the elasticity of total income zwith respect to 1−τ.

Suppose now that the tax base is eroded by excluding y from taxation. In that case

c = (1−τ)w+y +E = (1−τ)z+τ y +E. Therefore, individuals will find it profitable to shelter some of their income up to point where τ ·uc = uy. We can define the indirect

utility v(c0, w) = maxyu(c0 +y, w+y, y) and the analysis of Section 3.1 applies using the elasticity of taxable incomewwith respect to 1−τ. Becausew=z−yand sheltered incomey

responds (positively) to the tax rate τ, the elasticity ofwis larger than the elasticity ofzand hence the deadweight burden of taxation is higher with the narrower case. Intuitively, giving preferential treatment to y induces taxpayers to waste resources to shelter income y, which is pure deadweight burden. As a result, starting from the eroded tax base and introducing a small taxdt >0 onyactuallyreduces the deadweight burden from taxation, showing that the eroded tax base is a suboptimal policy choice.30

Therefore, comprehensive tax bases with low elasticities are preferable to narrow bases with 29

This possibility is developed in the context of an optimal linear income tax in Slemrod (1994), which draws on the metaphor of Okun (1975) in which revenue leakage is the leak in a bucket that transfers income from the top of the income distribution to the bottom.

30

(18)

large elasticities. Possible legitimate reasons for narrowing the tax base are (1) administrative simplicity (as in the model of Slemrod and Kopczuk, 2002),31 (2) redistributive concerns32 and (3) externalities such as charitable contributions, as discussed above.33

3 Empirical Estimation and Identification Issues

3.1 A Framework to Analyze the Identification Issues

In order to assess the validity of various empirical methods and the key identification issues, it is useful to consider a very basic model of income reporting behavior. Individual ireports income zit and faces a marginal tax rate τit = T0(zit) in year t. We assume that reported

income zit responds to marginal tax rates with elasticity eso that zit =z_it0 ·(1−τit)e, where

z_it0 is income reported when the marginal tax rate is zero, which we call potential income.34 Therefore, using logs, we have:

logzit=e·log(1−τit) + logzit0. (12)

Note, in light of our previous theoretical discussion, the assumptions that are embedded in this simple model: (1) No income effects (as virtual income E is excluded from specification (12), (2) The response to tax rates is immediate and permanent (so that short-term and long-term elasticities are identical), (3) The elasticity e is constant over time and uniform across individuals at all levels of income,35(4) Individuals have perfect knowledge of the tax structure and choosezitafter they know the exact realization of potential incomezit0. We will come back

to these assumptions below.

Even within the context of this simple model, an OLS regression of logzit on log(1−τit)

would not identify the elasticityein the presence of a graduated income tax schedule because

τit is positively correlated with potential log-income logz_it0; this occurs because the marginal tax rate may increase with realized income z. Therefore, it is necessary to find instruments correlated withτitbut uncorrelated with potential log-income, logz_it0, to identify the elasticity

e. The recent taxable income elasticity literature has used changes in the tax rate structure created by tax reforms in order to obtain such instruments. Intuitively, in order to isolate the 31_{In many practical cases, however, a comprehensive tax base such as a VAT is actually administratively} simpler than a complex income tax with many exemptions and a narrower base.

32_{Excluding large out-of-pocket health expenditures, as done in the U.S. individual income tax code, could} be such an example.

33

The public choice argument that narrow bases constrain Leviathan governments would fall in that category, as a Leviathan government produces a negative externality.

34

A quasi-linear utility function of the form u(c, z) = c−z0(z/z0)1+1/e/(1 + 1/e) generates such income response functions.

35

This assumption can be relaxed in most cases, but it sometimes has important consequences for identifica-tion, as we discuss below.

(19)

effects of the net-of-tax rate, one would want to compare observed reported incomes after the tax rate change to the incomes that would have been reported had the tax change not taken place. Obviously, the latter are not observed and must be estimated. We describe in this Section the methods that have been proposed and used to overcome this identification issue.

3.2 Simple before and after Reform Comparison

The simplest method uses reported incomes before a tax reform as a proxy for reported incomes after the reform (had the reform not taken place). This simple difference estimation method amounts to comparing reported incomes before and after the reform and attributing the change in reported incomes to the changes in tax rates.

Suppose that tax rates increase at timet= 1 because of a tax reform. Using repeated cross sections spanning the pre and post-reform periods, we can run the following 2SLS regression: logzit=e·log(1−τit) +εit, (13) using the post-reform indicator 1(t ≥ 1) as an instrument for log(1−τit). This regression identifies e if εit is uncorrelated with 1(t ≥ 1). In the context of our simple model (12),

this requires that potential log-incomes are not correlated with time. This assumption is very unlikely to hold in practice, as real economic growth creates a direct correlation between income and time. If more than two years of data are available, one could add a linear trend

β ·t in (13) to control for secular growth. However, as growth rates vary year-to-year due to macroeconomic business cycles, the elasticity estimate will be biased if economic growth happens to be different from year t= 0 to year t= 1 for reasons unrelated to the level of tax rates–we will ascribe to the tax change the impact of an unrelated but coincident change in average incomes.

In many contexts, however, tax reforms affect subgroups of the population differentially, and in some cases they leave tax rates essentially unchanged for most of the population. For example, in the United States during the last three decades, the largest absolute changes in tax rates have taken place at the top of the income distribution, with much smaller changes on average in the broad middle. In that context, one can use the group less (or not) affected by the tax change as a control and hence proxy unobserved income changes in the affected group (absent the tax reform) with changes in reported income in the control group. Such methods naturally lead to consideration of difference-in-differences estimation methods discussed in Section 3.4.

(20)

3.3 Share analysis

When the group affected by the tax reform is relatively small, one can simply normalize incomes of the group affected by a tax change by the average income in the population in order to control for macro-economic growth. Indeed, the most dramatic changes in U.S. marginal federal income tax rates have taken place at the top percentile of the income distribution (in appendix A we discuss in more detail the key individual tax changes since 1960). Therefore, and following Feenberg and Poterba (1993) and Slemrod (1996), a natural measure of the evolution of top incomes relative to the average is the change in the share of total income reported by the top percentile.36 Figure 1A displays the average marginal tax rate (weighted by income) faced by the top percentile income earners (scaled on the left y-axis) along with the share of total personal income reported by the top percentile earners (scaled on the right y-axis) from 1960 to 2006.37 The figure shows that indeed the marginal tax rate faced by the top 1% have declined dramatically since 1980. It is striking to note that the share received by the top 1% of income recipients started to increase precisely after 1981 — when marginal tax rates started to decline. The timing of the jump in the share of top incomes from 1986 to 1988 corresponds exactly to the sharp drop in the weighted average marginal tax rate from 45% to 29% after the Tax Reform Act of 1986. These correspondences in timing, first noted by Feenberg and Poterba (1993), provide circumstantial but quite compelling evidence that high incomes are indeed responsive to marginal tax rates.

Figure 1B shows the same income share and marginal tax rate series for the next 9% highest income tax filers (i.e., the top decile excluding the top 1% from Panel A). The marginal tax rate follows a different pattern, first increasing from 1960 to 1981 due primarily to bracket creep (as the tax system in this period was not indexed for inflation), followed by a decline until 1988 and relative stability afterwards. In contrast to the top 1%, however, the share of the next 9% in total income is very smooth and trends slightly upward during the entire period. Most importantly, it displays no correlation with the level of the marginal tax rate either in the short run or in the long run. Thus, the comparison of Panel A and Panel B suggests that the behavioral responses of the top 1% are very different from those of the rest of top decile, and hence that the elasticity eis unlikely to be constant across income groups.

Using the series displayed in Figure 1, one can estimate the elasticity of reported income around a tax reform episode taking place between pre-reform yeart0 and post-reform year t1

36

In what follows, we always exclude realized capital gains from our income measure as realized capital gains in general receive a tax-favored treatment and there is a large literature analyzing specifically capital gains realization behavior and taxes (see Auerbach 1988 for a survey). See a further discussion of this issue in Section 4.1.

(21)

as follows:

e= logpt1−logpt0 log(1−τp,t1)−log(1−τp,t0)

, (14)

whereptis the share of income accruing to the top 1% (or the next 9%) earners in yeartand

τp,t is the average marginal tax rate (weighted by income) faced by taxpayers in this income

group in year t. This method identifies the elasticity if, absent the tax change, the top 1% income share would have remained constant from year t0 to year t1. As shown in Table 1, Panel A, applying this simple method using the series depicted in Figure 1 around the 1981 tax reform by comparing 1981 and 1984 generates an elasticity of 0.60 for the top 1%. Comparing 1986 and 1988 around the Tax Reform Act of 1986 yields a very large elasticity of 1.36 for the top 1%.38 In contrast, column (2) in Table 1 shows that the elasticities for the next 9% are much closer to zero around those two tax episodes. The 1993 tax reform also generates a substantial elasticity of 0.45 for the top 1% when comparing 1992 and 1993. Strikingly, though, comparing 1991 to 1994 yields a negative elasticity for the top 1% – probably due to tax rate endogeneity, as there were no legislated changes between 1991 and 1992 or between 1993 and 1994. Hence, Table 1 shows that the elasticity estimates obtained in this way are sensitive to the specific reform, the income group, as well as the choice of years – important issues we will come back to later on.

A natural way to estimate the elasticity e using the full time-series evidence is to run a simple time-series regression of the form:

logpt=e·log(1−τp,t) +εt. (15)

As reported in Table 1, such a regression generates a very high estimate of the elasticity e

of 1.71 for the top 1%.39 However, this is an unbiased estimate only if, absent any marginal tax rate changes, the top 1% income share series would have remained constant or moved in a way that is uncorrelated with the evolution of marginal tax rates. But it is entirely possible that inequality changed over time for reasons unrelated to tax changes: the secular increase in income concentration in the United States since the 1960s was probably not entirely driven by changes in the top tax rates, hence biasing upward the estimate ofe.4041 For example, Figure 1 shows that there was a sharp increase in the top 1% income share from 1994 to 2000 in spite of little change in the marginal tax rate faced by the top 1%, which suggests that changes in marginal tax rates are not the sole determinant of the evolution of top incomes (at least in

38

Goolsbee (2000b) also obtained such a large elasticity using TRA 1986 and a similar approach. 39

The estimated elasticity for the next 9% is very small (e= 0.01), and not significantly different from zero. 40_{See Katz and Autor (1999) for a comprehensive analysis of wage inequality in the United States since 1960.} 41

Reverse causality is also a possibility. If incomes of the already affluent increase, the group might have more political influence and success in lobbying the government to cut top tax rates.

(22)

the short run).42

It is possible to add controls for various factors affecting income concentration through channels other than tax rates in regression (15), as in Slemrod (1996). Unfortunately, we do not have a precise understanding of what those factors might be. An agnostic solution is to add time trends to (15). As shown in Table 1, such time trends substantially reduce the estimated elasticity, although it remains significant and above 0.5. The key problem is that we do not know exactly what time-trend specifications are necessary to control for non-tax related changes, and adding too many time controls necessarily destroys the time-series identification. It could be fruitful to extend this framework to a multi-country time-series analysis. In that case, general time trends will not destroy identification, although it is possible that inequality changes differentially across countries (for non-tax related reasons), in which case country-specific time trends would be required and a similar identification problem would arise. Atkinson and Leigh (2004) and Roine, Vlachos, and Waldenstrom (2008) propose first steps in that direction. The macro-economic literature has recently used cross-country time-series analysis to analyze the effects of tax rates on aggregate labor supply (see e.g., Ohanian, Raffo, and Rogerson, 2008) but has not examined directly effects on reported income, let alone reported income by income groups.

3.4 Differences-in-difference Methods

Most of the recent literature has used micro-based regressions using “differences-in-difference” (DD) methods43in which changes in reported income of a treatment group (experiencing a tax change) are compared to changes for a “control” group (which does not experience the same, or any, tax change).44 It is useful to point out from the start that the concept of treatment and control groups in the ETI context is substantially different from the ideal randomized experiment where a randomly selected treatment group would be assigned to a specific tax rate. Indeed, in the ETI context, control and treatment groups are almost always defined by income which creates several identification issues.

In order to illustrate the identification issues that arise with those various methods, we will examine the 1993 tax reform in the United States that introduced two new income tax brackets — raising rates for those at the upper-end of the top tax bracket from 31% (in 1992 and before) to 36% or 39.6% (in 1993 and after) and enacted only minor other changes (see

42

Slemrod and Bakija (2001) call the behavior of reported taxable income over this period a “non-event study.”

43_{For earlier reviews of this literature, see Slemrod (1998) and Giertz (2004).} 44

Note that share analysis is conceptually related to the DD method as share analysis compares the evolution of incomes is a given quantile (the numerator of the share is the treatment group) relative to the full population (the denominator of the share is implicitly the control group).

(23)

appendix A for details). Figure 1 shows that the average marginal tax rate for the top 1% increased sharply from 1992 to 1993 but that the marginal tax rate for the next 9% was not affected. Our empirical analysis is based on the Treasury panel of tax returns described in Giertz (2008c). As we discuss in appendix B, those panel data are created by linking the large annual tax return data stratified by income used by U.S. government agencies. Therefore, the data include a very large number of top 1% taxpayers.

3.4.1 Repeated Cross-Section Analysis

Let us denote by T the group affected by the tax change (the top 1% in our example) and by C a group not affected by the reform (the next 9% in our example). We denote byt0 the pre-reform year and byt1 the post-reform year. Generalizing our initial specification (13), we can estimate the 2SLS regression:

logzit=e·log(1−τit) +α·1(t=t1) +β·1(i∈T) +εit, (16)

on a repeated cross-section sample including both the treatment and control groups and includ-ing the yeart0 and yeart1samples, and using as an instrument for log(1−τit) the post-reform and treatment group interaction 1(t=t1)·1(i∈T).

Although we refer in this section to income tax rate schedule changes as a treatment, they certainly do not represent a classical treatment in which a random selection of taxpayers is pre-sented with a changed tax rate schedule, while a control group of taxpayers is not so subject. In fact, in any given year all taxpayers of the same filing status face the same rate schedule. When the rates applicable at certain income levels change more substantially than the rates at other levels of income, however, some taxpayers are more likely to face large changes in the applicable marginal tax rate than other taxpayers. Two problems arise. The first is that when the likely magnitude of the tax rate change is correlated with income, any non-tax-related changes in taxable income that vary systematically by income group will need to be disentan-gled from the effect on taxable incomes of the rate changes. Second, non-tax-related changes in income may affect which marginal tax rate is applicable to a taxpayer in a given year. Because the presence of transitory income implies mean reversion, some of this year-to-year change will be systematic; this will be especially problematic in periods when tax rates con-verge, or dicon-verge, because the mean reversion in taxable income will be correlated (negatively or positively, respectively) with the expected effect of the tax rate changes themselves.

We run regression (16) weighted by incomezit(in order to give proportionally more weight to high-income taxpayers, as their response contributes proportionately more to the aggregate elasticity as discussed in Section 2.1). The 2SLS estimate is a classical difference-in-difference

(24)

estimate equal to:

e= [E(logzit1|T)−E(logzit0|T)]−[E(logzit1|C)−E(logzit0|C)]

[E(log(1−τit1)|T)−E(log(1−τit0)|T)]−[E(log(1−τit1)|C)−E(log(1−τit0)|C)] .

(17) Thus, the elasticity estimate is the ratio of the pre- to post-reform change in log incomes in the treatment group minus the same ratio for the control group to the same difference-in-differences in log net-of-tax rates (with all expectations weighted by income).

Using data from 1992 (pre-reform) and 1993 (post-reform), we define the treatment group as the top 1% and the control group as the next 9% (90th percentile to 99th percentile). Note that being in the treatment group depends on the taxpayer’s behavior. Table 2, Panel A shows that the elasticity estimate is around one, which reflects the fact that the top 1% incomes decreased sharply from 1992 to 1993 while the next 9% incomes remained stable as shown in Figure 1. However, comparing 1991 to 1994 generates a negative elasticity, as the top 1% incomes increased faster than the next 9% incomes from 1991 to 1994 (Figure 1). The sign switch mirrors the results of Table 1, Column 1.

As is standard in the case of DD estimation, formula (17) will yield an unbiased estimate of the elasticity eif the parallel trend assumption holds: Absent the tax change, the numerator would have been zero, i.e., log-income changes pre- to post-reform would have been the same in the treatment and control groups. In the case of our example, that means that the top 1% incomes grow at the same rate as the next 9% incomes (absent the tax change). Such an assumption can be tested using pre-reform years or post-reform years in order to construct placebo differences-in-difference estimates. As is clear from Figure 1, the top 1% incomes increase sharply from 1994 to 2000 relative to average incomes while the share of the next 9% income is almost flat. Therefore, the DD identification assumption is clearly violated in the post-reform period.

In cases where the parallel trend assumption does not hold, we can generalize equation (16) by pooling together several pre-reform years and post-reform years and running the following 2SLS regression (assuming the tax change takes place in year ¯t):

logzit=e·log(1−τit) +α·1(t≥¯t) +β·1(i∈T) +γC·t+γT ·t·1(i∈T) +εit, (18)

where we have added separate time trends for the control and treatment groups and where the instrument is the post-reform and treatment interaction 1(t ≥ ¯_t₎·1(i ∈ T). As shown in Table 2, Panel A, with no time trends the regression produces a negative elasticity of

e=−0.404(0.089) because the top 1% incomes increase faster than next 9% incomes over the period 1991 to 1997 in spite of the top tax rate increase. Adding time trends generates a very large and positive elasticity of e= 1.329(0.107). This large elasticity can be explained using

(25)

the evidence from Figure 1: from 1991 to 1997, the top 1% incomes increase relative to the next 9%. However, from 1992 to 1993, top 1% incomesfall overall and relative to the next 9% incomes exactly at the time of the tax change. Hence, the pooled regression (18) assumes that this reversal is due to a large immediate and permanent elasticity of reported income with respect to tax rates. We discuss below the issue of short-term vs. long-term responses, which is central to this particular tax episode. Column (2) in Table 2 shows that those repeated cross-section estimates are not sensitive to broadening the control group from the next 9% to the next 49% as incomes for both the next 9% and the next 49% move together, exhibiting very slow growth over the period.

Finally, note that if the control group faces a tax change, DD estimates will consistently estimate the elasticity only if the elasticities are the same for the two groups. To see this, refer back to equation (17). Suppose that the control group experiences a change in tax rates that is half the size of the tax rate change for the treatment group, so that E(log(1−τit1)|C)− E(log(1−τit0)|C) = 0.5·[E(log(1−τit1)|T)−E(log(1−τit0)|T)]. Assume further that the DD identification assumption holds, but that the elasticity in the control group is zero while the elasticity iseT >0 in the treatment group. In that case, we haveE(logzit1|T)−E(logzit0|T) = eT ·[E(log(1−τit1)|T)−E(log(1−τit0)|T)] and E(logzit1|C)−E(logzit0|C) = 0 and hence formula (17) leads to e= 2·eT: the estimated elasticity is twice as large as the true elasticity

in the treatment group. This example might actually apply around TRA 1986 as we have seen in Table 1 that, based on share elasticities, the elasticity around that episode may be large for the top 1% but close to zero for the next 9% (and the next 9% experiences a tax rate cut that is about half of the tax rate cut for the top 1% from 1986 to 1988). This source of bias partly explains why Feldstein (1995) obtained such large elasticities around TRA 1986, a point made originally by Navratil (1995b), as we discuss in Section 4.1.2.

3.4.2 Panel Analysis

Following the influential analysis of Feldstein (1995), the great majority of empirical studies have used panel data to estimate the elasticity of reported taxable income with respect to net-of-tax rates. With panel data, we can define the treatment group T as the top 1% of income earners and the control group C as the next 9% of tax filers based on income in pre-reform year t0, and follow those tax filers into post-reform year t1. We can then run the 2SLS panel regression: logzit1 zit0 =e·log 1−τit1 1−τit0 +εit, (19)