Appendix to Chapter 1
A.5 Additional econometric details
ph(t) = α(t)/Lh(t) + Rh(t)/Lh(t) [α(t) + β(t)]/Lh(t) + 1 ,
and we see that terms other that Rh(t)/Lh(t) vanish since Lh(t)−→ ∞. Applying the fact thata.s.
Rh(t)/Lh(t)−→ pa.s. h then completes the proof. If, on the other hand, the second part of condition (iii) holds, then write
In this appendix, I provide additional detail on the event study estimator used in this paper. The first subsection provides a a proof of Lemma 2. The second subsection discusses the estimator’s small-sample properties.
A.5.1 Proof of Lemma 2
Before proving Lemma 2, it will be helpful to establish an additional lemma, which is essentially a law of large numbers for hierarchical settings like the current one.
Lemma 4. Let {(Gi, Zi)}i∈N be an i.i.d. sequence of random tuples where Gi ∈ N and Zi is a
denotes expectation with respect to the population marginal distribution of Zij.
Proof. Since E[|Si(Z)|] > 0, the left-hand-side of (A.2) is well-defined with probability approaching
one, so we can safely focus on this case. We can re-write the left-hand-side of (A.2) as follows:
Applying a standard law of large numbers to each term on the right-hand-side then implies that the right-hand-side converges to E∗[|Si(Z)|]−1E∗[P
j∈Si(Z)h(Zij)].
The population marginal distribution of Zij satisfies
P(Zij ∈ A) = 1 separately to each group sum in the definition of ˆ∆q and pointwise to ˆWWTOT demonstrates that each converges to its population counterpart. The result follows.
A.5.2 Estimator small sample properties
Lemma 2 establishes that the event study estimator used herein is consistent, but the proof relies on an argument that the relevant means are consistent for the desired quantities for each tuple (e, x) ∈ C. In the current application, however, the number of providers with events can be small or zero for many (e, x) tuples, which raises the question of whether the asymptotic results provide a good guide to the estimator’s properties in practice. In this appendix, I show that, under plausible conditions, the estimator is conditionally unbiased for a weighted average of conditional average causal effects (see Imbens and Wooldridge (2009) for a discussion of such estimands). Although this causal effect may differ from the weighted average causal effect described in Lemma 1, this result provides reassurance regarding the estimator’s small-sample properties.
Formally, we are interested in the properties of the estimator for any number of sampled providers M . The operator E [·] will denote expectation with respect to the joint distribution
of an M -provider random sample. I establish the properties of the small-sample properties of the estimator under the following condition:
Condition CT0 (Finite sample common trends). For all provider sample sizes M , all event time and characteristic tuples (e, x) ∈ C, and all times-since-event q, r ∈ H, the following two conditions holds:
Condition CT0 replaces Condition CT. This new condition states that, conditional on the realized estimate of the weighting function and the set of cells containing a positive number of deliveries, the counterfactual trend for event units is the same as the realized trend for non-event units.
Condition CT is not sufficient to ensure that the estimator has good finite-sample properties for two reasons. The first is that the birth-level marginal distribution in any finite sample may differ from the population marginal distribution used to state Condition CT, so Condition CT may not directly apply in finite samples; this will occur if providers’ level and time pattern of volume is predictive of the potential outcomes for their associated deliveries.2 The second is that, in finite samples, the desired weighting function W (e, x) may be estimated with error. To the extent the error in W (e, x) covaries with the deviation from common trends in any particular sample, bias can result. Once again, the most plausible source of such a correlation is correlation between providers’
2To see why this is the case, it may be helpful to consider a simple numerical example. Consider a cluster-sampling setting with two types of units. Type A units have 1 sub-unit and mean µA, while type B units have 9 sub-units and mean µB. Assuming both types of units are equally likely to be drawn, it easy to see that the population marginal mean is (1/10)µA+ (9/10)µB. Under these assumptions, however, the mean of the sub-unit marginal distribution for a consisting of one provider, however, is (1/2)µA+ (1/2)µB. As the number of sampled providers increases, the marginal distribution will converge toward the population marginal distribution.
level and time pattern of volume and the potential outcomes for their associated deliveries.
With Condition CT in place, the following lemma can be proved using arguments parallel to those used to prove Lemma 1.
Lemma 5. If conditions NPE and CT0 hold, then
EM[ ˆ∆q− ˆ∆r| {(B(e0, x0), ˆW (e0, x0))}(e0,x0)∈C] = X
The first part of Lemma 5 demonstrates that the difference-in-difference estimator is (condi-tionally) unbiased for a weighted average of expected conditional sample average treatment effects on the treated.3 This estimand is closely related to, but distinct from the estimand in Lemmas 1 and 2. In practice, the most important differences is that different (e, x) cells are weighted by the estimated version of W (e, x), rather than the true version; for choices of weights like WWTOT, the present estimand will therefore afford more weight to cells that happen to have a large number of “treated” births in the realized sample. The second part of Lemma 5 indicates that, as in the large-sample case, we should expect the difference between event and non-event units to be constant prior to event occurrence. Thus, as in the large-sample case, the research design permits a direct test of the required common trends assumption.