WORKING PAPER 2002:20
Dynamically assigned
treatments: duration models,
binary treatment models,
and panel data models
Jaap H. Abbring
Gerard J. van den Berg
wp 2002-20.qxd 02-11-11 14.10 Sida 1
The Institute for Labour Market Policy Evaluation (IFAU) is a research
insti-tute under the Swedish Ministry of Industry, Employment and
Communica-tions, situated in Uppsala. IFAU’s objective is to promote, support and carry
out: evaluations of the effects of labour market policies, studies of the
function-ing of the labour market and evaluations of the labour market effects of
meas-ures within the educational system. Besides research, IFAU also works on:
spreading knowledge about the activities of the institute through publications,
seminars, courses, workshops and conferences; creating a library of Swedish
evaluational studies; influencing the collection of data and making data easily
available to researchers all over the country.
IFAU also provides funding for research projects within its areas of interest.
There are two fixed dates for applications every year: April 1 and November 1.
Since the researchers at IFAU are mainly economists, researchers from other
disciplines are encouraged to apply for funding.
IFAU is run by a Director-General. The authority has a traditional board,
con-sisting of a chairman, the Director-General and eight other members. The tasks
of the board are, among other things, to make decisions about external grants
and give its views on the activities at IFAU. Reference groups including
repre-sentatives for employers and employees as well as the ministries and
authori-ties concerned are also connected to the institute.
Postal address: P.O. Box 513, 751 20 Uppsala
Visiting address: Kyrkogårdsgatan 6, Uppsala
Phone: +46 18 471 70 70
Fax: +46 18 471 70 71
[email protected]
www.ifau.se
Papers published in the Working Paper Series should, according to the IFAU policy,
have been discussed at seminars held at IFAU and at least one other academic forum,
and have been read by one external and one internal referee. They need not, however,
have undergone the standard scrutiny for publication in a scientific journal. The
pur-pose of the Working Paper Series is to provide a factual basis for public policy and the
public policy discussion.
Duration Models, Binary Treatment Models, and Panel Data Models
Jaap H. Abbring Gerard J. van den Berg
y
August 30, 2002
Abstract
Often, the moment of a treatment and the moment at which the out-comeofinterestoccursarerealizationsofstochasticprocesseswith depen-dent unobserveddeterminants. Notably,bothtreatment and outcome are characterizedbythemomenttheyoccur.Wecomparedierentmethodsof inferenceofthetreatmenteect,andwearguethatthetimingofthe treat-ment relative totheoutcomeconveys usefulinformationon thetreatment eect,whichis discardedinbinary treatment frameworks.
Department of Economics, Free University Amsterdam, and Department of Economics, UniversityCollegeLondon
y
Departmentof Economics,Free UniversityAmsterdam, DeBoelelaan1105, NL{1081HV Amsterdam,TheNetherlands,TinbergenInstitute,IFAU-Uppsala,andCEPR.
Keywords: program evaluation, treatment eects, bivariate duration analysis, selection bias,hazardrate,unobservedheterogeneity,xedeects,randomeects.
JELcodes:C14,C31,C41.
Thanks to participants at the Tenth Panel Data Conference in Berlin, 2002, in particu-lartoourdiscussantBoHonore,forhelpful comments.
In virtually every real-life treatment situation,both the treatment and the out-comeof interestare realizedatspecicpointsin time. Typicalexamples include the eect of trainingprograms orpunitive benetsreductions on unemployment durations,theeect ofthehiringofreplacementworkers onstrikedurations,and the eect of promotions on tenure. An extensive literature on the evaluation of social programs and treatment eects exists, but in general this literature does not address nor exploit the specic informationon the timing of events (see, for instance,Heckman, LaLondeandSmith,1999,foranoverview ofthisliterature). Abbringand Vanden Berg(2002) (henceforth AVdB) advanceonthis litera-ture by explicitly considering the evaluation of treatment eects in a \duration variables"context.Considerasubjectinacertainstate.Afteracertainstochastic amount of time, the subject leaves this state. The subject may receive a treat-ment at some stochastic moment before it leaves the state. The parameter of interest is the eect of the treatment on the exit rate out of the state. AVdB adopt an explicit general model framework for the distribution of the durations untiltreatment and outcome. (In fact,they present the modelinterms of coun-terfactual variables, but in the present paper we suppress this for expositional convenience.) It allows the duration variablesto be dependent by way of depen-dent unobserved determinants, with each single duration having its own Mixed Proportional Hazard (MPH) model specication. In addition to this, a causal eect of the realized treatment works on the exit rate out of the current state from the moment the treatment is realized onwards. The MPH model is by far themost populardurationmodel(see Vanden Berg,2001,for asurvey).Models tting into the AVdB model framework have been estimated by, for example, Cardand Sullivan(1988),Gritz (1993),Lillard(1993), Lillardand Panis (1996), Bonnal, Fougere and Serandon (1997), Abbring, Van den Berg and Van Ours (1997),and Vanden Berg,Vander Klaauwand VanOurs (2004).
AVdB demonstrate thattheir baselinemodel, includingthe causal treatment eect, is non-parametrically identied from single-spell duration data, that is, from a random sample of subjects owing into the state of interest and having subsequently been followed over time. This result has a number of notable as-pects. First, it does not require exclusion restrictions on observed covariates, so the data need not contain a variable that aects the treatment assignment but doesnot aect the outcome of interest other than by way of the treatment. Ex-clusion restrictions are often diÆcult to justify. If a variable is observed by the analyst then itis oftenalsoobservable tothe individuals underconsideration. If
maybe subjecttotreatment,thenhe takeshisvalue ofthe variableintoaccount todeterminehis optimalstrategy, andthis strategy aects the rate atwhich the individual leaves the state of interest. Indeed, if the individual knows that the variable is an important determinant of the treatment assignment process then he may have a strong incentive to inquire the actual value of the variable. Sec-ondly,theresult doesnotrequire parametricfunctionalformassumptionsonthe bivariateprobabilitydistributionoftheunobserved heterogeneitytermsoronthe durationdependence, the covariate eects, and the treatment eect. Obviously, thisisadesirableproperty. Thirdly,asmentionedabove,the AVdBmodelallows for selectioneects by way of unobserved determinants aectingboth the treat-ment assignment and the outcome. In other words, it is not necessary to make aconditionalindependence assumptionstating thatthe data are able tocapture all systematic determinants of the process of treatment assignment so that the remaining observed variation inthe treatment assignment is independent of the determinantsoftheoutcome ofinterest.Inapplications,suchanassumptionmay be diÆcult to justify, for example if the treatment assignment is carried out by caseworkers whousediscretionarypower,taking individualcharacteristicsofthe subject intoaccount that are unobserved to the analyst.
Standard methods of treatment evaluation often rely on exclusion restric-tions, parametric functional form assumptions on the joint distribution of the \errorterms"inthe model,orconditionalindependenceassumptions,toidentify the treatment eect. In this sense, treatment evaluation with the AVdB model comparesfavorably tothosemethods. Theaimof the presentpaperistoprovide abetter understanding of this. More precisely,we examinewhich informationor variation in the data enables identication of the treatment eect in the AVdB model framework, by comparing the model specication and the data to those usedinstandard methodsof treatment evaluationinthe presence ofselection ef-fects. Specically,we makea comparisonto latent variablemethodswith binary treatment indicators and to panel data methods (see e.g. Wooldridge, 2002, for atextbook overview).
The paper is organized as follows. In Section 2 we present the AVdB model framework. Section 3 makes the comparison to latent variablemethods. Section 4makes the comparison topanel data methods. Section5 concludes.
cally assigned treatments
Considerasubjectinacertainstate.Afteracertainamountoftime,the subject leaves this state. The subject may receive a treatment at some moment before itleaves the state. We are interested in the determinantsof the event of leaving the state, so the latter event is the event of interest, and the durationuntil this event is the variable of interest. To x thoughts, consider an individual who is unemployed and who moves to employment ata certain point of time, and who may receive a training at some point during his spell of unemployment. For the sakeofconvenience,weusetheterm\individual"ingeneraltodenotethesubject. We normalize the point of time at which the individual enters the state to zero. The durations T
m
and T p
measure the duration until the event of interest and the duration until treatment, respectively. The populationthat we consider concernsthein owintothestate,andtheunconditionalprobabilitydistributions thataredened belowaredistributionsinthein owintothe state.Whetherthis is the in ow at a xed point of calendar time or the total in ow over time (or the in owatanother rangeof in owdates) depends onthe application athand.
The two durations are random variables. We use t m
and t p
to denote their realizations. We assume that, for a given individual in the population, the du-rationvariablesare absolutelycontinuous random variables.We assume that all individual dierences in the joint distribution of T
m ;T
p
can be characterized by explanatoryvariablesX;V,whereX isobserved andV isunobserved tothe ana-lyst.Ofcourse,the jointdistributionof T
m ;T
p
jX;V canbeexpressed intermsof the distributions of T p jX =x;V and T m jT p = t p
;X =x;V. The latter distribu-tionsare inturncharacterizedby theirhazardrates
p (tjx;V)and m (tjt p ;x;V), respectively. 1
Asnotedintheintroduction,weareinterestedinthecausaleectoftreatment on the exit out of the current state. The treatment and the exit are character-ized by the moments at which they occur, and we are interested in the eect of the realization of T
p
on the distribution of T m
. To proceed, we assume that, 1
For a nonnegative random (duration) variable T, the hazard rate is dened as (t) = lim
dt#0
Pr(T 2 [t;t+dt)jT t)=dt. Somewhat loosely, this is the rate at which the spell is completedattgiventhatithasnotbeencompletedbefore,asafunctionoft.Itprovidesafull characterizationofthedistributionofT(seeLancaster,1990,andVandenBerg,2001).Consider thedistributionof aduration variableconditionalonsomeother variables.It iscustomaryto useavertical\conditioningline"within theargumentofahazardratein ordertodistinguish between(ontheleft-handside)thevalueofthedurationvariable atwhich thehazardrateis evaluated, and(ontheright-hand side)thevariablesthatareconditionedupon.
m p the realization of T
p
aects the shape of the hazard of T m
from t p
onwards, in a deterministicway. This implies that the causal eect is captured by the eect of t p on m (tjt p ;x;V) for t > t p
. It is ruled out that t p aects m (tjt p ;x;V) on t2 [0;t p
]. The latter could be called a \no anticipation" assumption, because it basically means that the individual's behavior before the moment of treatment does not depend on the future realization of the moment of treatment. AVdB show that such anassumption is crucialfor identication.
2 LetV :=(V m ;V p ) 0
bea(21)-vectorofunobservedcovariates.LetT p ?? V m jx;V p , implyingthat p (tjx;V)= p (tjx;V p ).Furthermore, letT m ?? V p jt p ;x;V m , so that m (tjt p ;x;V) = m (tjt p ;x;V m
). Somewhat loosely, one may say that V p
(V m
) captures the unobserved determinants ofT
p (T
m
).Nowletusturn tothe speci-cationsofthe hazardrates
m (tjt p ;x;V m )and p (tjx;V p
).Weadoptthe following modelframework, Model 1. p (tjx;V p ) = p (t) p (x)V p (1) m (tjt p ;x;V m ) = m (t) m (x)Æ(tjt p ;x) I(t>tp) V m (2) 2
Inreality,thereisoftennoinformationavailableonthedegreetowhichanactualtreatment is anticipated. Even if some anticipation cannot be ruled out, there is virtually never any information on the moment at which the individual receivesinformation on the moment of treatmentunless themomentoftreatmentis fullypredictableat anindividuallevel.Thefact thatarealizationoftheeventofinterestcouldbeduetotheanticipationofafuturetreatment hashauntedtheempiricalliteratureontreatmenteects.Manystandardtreatmentevaluation studies suer from a potential bias due to anticipatory eects. This includes studies using \dierence-in-dierences"methodswhereone\dierence"concernsacomparisonbetween pre-andpost-treatmentcircumstances(seeHeckman,LaLondeandSmith,1999,foranoverview). Nowsupposethatthedeterminantsofthestochastic processoftreatmentassignmentaect theindividual'sexit rateoutofthestateof interestbefore theactualrealizationof the treat-ment.Then thetreatment programis said to havean ex ante eect on exit out of the state of interest. Such an eect is to beexpected in well-establishedprograms. The ex ante eect shouldnotbeconfusedwithanticipationoftherealization oftheprocessoftreatment assign-ment,becausein thelattercasethe individualknowsthestochasticoutcome ratherthan the determinantsoftheprocess.Theexante eectcanbecontrastedtotheexposteectof treat-ment,whichistheeectofarealizedtreatmentontheindividualexitrate{thisisofcoursethe eectwefocusoninthispaper.Wedonotdealwithexanteeectsoftreatment.Identication wouldrequireadditionalinformation,suchasstrongfunctional-formassumptions,instruments foracomparisonofaworldwithatreatmentprogramto aworldwithouttheprogram,orthe impositionofaneconomic-theoreticstructureonthemodel.
otherwise.
Apart fromthe term involving Æ(tjt p
;x), the above hazard rates have Mixed ProportionalHazard(MPH)specications.Thefunction
i
(t)iscalledthe \base-linehazard"sinceitgivestheshapeofthehazardrate
i
foranygivenindividual. Thehazardrate issaid tobedurationdependent ifitsvaluechanges overt. Pos-itive(negative)durationdependence means that
i
(t)increases (decreases). The term
i
(x) is called the \systematic part" of the hazard. In applied duration analysis, it is common to specify
i
(x)=exp (x 0
i
), so that the hazard function is multiplicative in all separate elements of x. In biostatistics, is often called the treatment eect if x captures whether the subject has received a treatment at the beginning of the spell, but we avoid such confusing terminology. Finally, the term V
i
is called the \unobserved heterogeneity term". (AVdB also analyze alternativemodelspecications,notablywith Ædependingont;x;and onathird unobserved heterogeneity term, say V
Æ .) ThetermÆ(tjt p ;x) I(t>tp)
capturesthetreatmenteect.Thenotationusedhere requiressomediscussion.First,notethattherearealternativebutobservationally equivalent ways of capturingthis eect. Forexample,one may suppress I(t>t
p ) and redene Æ(:) such that it is zero if t t
p
. This is however less attractive from an expositional point of view. Forexample, in our setup a constant treat-ment eect is relatively easy to capture. The function Æ(tjt
p
;x) is not identied on t 2 [0;t
p
], so by not restricting its values on this interval we create an un-interesting identication problem. In the sequel, it is silently understood that all statements concerning Æ(tjt
p
;x), including identication statements, concern Æ(tjt p ;x) onf(t;t p ;x) 2[0;1) 2 X :t >t p
g. It is useful to separate t from the other arguments of Æ(:) by way of avertical \conditioningline", because we will occasionallyintegrate overt.
Clearly, treatment is ineective if and only if Æ(tjt p
;x) 1. Now suppose Æ(tjt
p
;x) is equal to a constant larger than one. If T p
is realized then the level of the individual exit rate out of the current state increases by a xed amount. This stochastically reduces the remaining duration in that state, in comparison tothe casewherethe treatmentisgivenatalaterpointoftime.More ingeneral, we allow the treatment eect to vary with the moment of treatment t
p
, with x, andwith theelapsedtimet inthecurrent state.Asaresult,the individualeect may alsovary with the time t t
p
since (the onset of) treatment.
Forconvenience, wemakeanumberofnormalizationsandregularity assump-tions onthe determinants of the model(see AVdB). Notably, the individual val-ues of x are taken to be time-invariant. For our results it is useful to point out
atesaway,by assumingthat thereisonlyasinglepointoftime atwhichpossible treatment and outcome may occur. Time-varying covariates are potentiallyvery useful for the identicationof durationmodels (see Honore,1991, and Heckman and Taber, 1994).
AmoresubstantiveassumptionisthatX isindependentfromV m
;V p
.Wealso assumethat E(V
i
)<1. This is acommon assumption in the analysis of single-spellduration data with MPH-type models (see Heckman and Taber, 1994, and Van den Berg, 2001,for surveys).
It isuseful tophrase the problemof the identicationof the treatment eect inthepresenceof\selectivity",inthecontextofourModel1.First,notethatthe datatypicallyprovideobservationsonrealizationsof T
m
andx. Inaddition,if T p iscompletedbeforetherealizationt
m
thenwealsoobservetherealizationt p
, oth-erwise we merely observe that T
p
exceeds t m
. Now consider the (sub)population of individuals with a given value of x. The individuals who are observed to re-ceive a treatment at a date t
p
are a non-random subset from this population. The most important reason for this is that the distribution of V
p
among them doesnot equal the corresponding populationdistribution,becausemost individ-uals with high values of V
p
have already had the treatment before. IfV p
and V m are dependent, then by implicationthe distribution of V
m
amongthem does not equalthe corresponding populationdistribution either.A second reasonfor why the individuals who are observed to receive a treatment at a date t
p
are a non-random subset is that, in order to observe the fact that treatment occurs at t
p , the individual should not have left the state of interest before t
p
. Because of all this,the treatmenteect cannotbe inferredfromadirect comparisonof realized durations t
m
of these individuals to the realized durations of other individuals. If the individuals with a treatment at t
p
have relatively short durations then this can befor two reasons: (1) the individualtreatment eect ispositive,or(2) these individuals have relatively high values of V
m
and would have found a job relatively fast anyway. The second relation is called a spurious relation as it is merely due to the limited observability of the set of explanatory variables. This relationisreferredtoas\selectivity".IfV
m andV
p
are independentthenI(t>t p
) isan\ordinary"exogenous time-varyingcovariatefor T
m
,and one may inferthe treatment eect froma univariatedurationanalysis based onthe distribution of T m jt p ;x;V m
mixed over the distribution of V m
. However, in general there is no reason toassume independence of V
m
and V p
, and if this possible dependence is ignoredthen the estimate of the treatmenteect may be inconsistent.
restric-p p m p m
properties) depend on their determinants. More specically, the model speci-cationis nonparametric in the sense that we donot make parametric functional formassumptionsontheprobabilitydistributionoftheunobserved heterogeneity terms,the baselinehazards, the systematichazards, and the treatmenteect.
Wedonot imposethatthereareobservedexplanatoryvariablesthatdoaect T
p
butdonotaect T m
otherthanbywayoft p
.Inotherwords,wedonotexclude elementsof x from
m
(x)that are included in p
(x).
We now examineModel1 froma number of dierent angles.First, the prob-abilityPr (T m >t m ;T p >t p jx) can beexpressed as Pr(T m >t;T p >t p jx)= Z 1 0 Z 1 0 exp( m (x)v m [ m (minft;t p g)+I(t >t p )(tjt p ;x)] p (x)v p p (t p ))dG(v m ;v p ) with i (t):= Z t 0 i ()d (tjt p ;x):= Z t t p m ()Æ(jt p ;x)d The joint density of T
m ;T
p
jx follows from dierentiation with respect tot m
and t
p
.Notethat Model1and the aboveincludeaspecication ofthe distributionof T p for T p >T m
, but this specication is immaterial,as it does not play any role inthe paper orindeed inany empiricalanalysis.
To make comparisons to other models, it is useful to rewrite Model 1 as a regression-typemodel.It iswell known that the integrated hazardof a duration distribution has an exponential distribution with parameter 1 (see e.g. Ridder, 1990, Horowitz,1999). From this it follows that we can write
log p (T p )= log p (x) logV p +" p (3) log[ m (minfT m ;t p g)+I(T m >t p )(T m jt p ;x)]= log m (x) logV m +" m (4) where" p and" m
haveanExtreme Value{TypeI(EV1)probabilitydistribution. This distributiondoes not have any unknown parameters; itsdensity equals
f(" i )=e " i e e " i ; 1<" i <1
p termsontheright-handside,whereV
p and "
p
arerandomvariables.Equation(4) states that the random variable T
m jt
p
;x is distributed as the sum of the terms onthe right-handside, where V
m and "
m
are randomvariables.Thereholds that " m ?? " p ; and (" m ;" p ) ?? (X;V m ;V p ) but V p and V m
are allowed to be dependent. It is important to stress that the fact that "
p
and " m
have a fully specied distribution does not mean that we make a parametric functional form assumption on the distributions of T
p jx;V p and T m jt p ;x;V m
. This is because the left-hand sides of the above regression-type equations specify unknown transformations of the dependent variables: the integratedbaselinehazard
p forT
p
,andageneralizedintegratedbaselinehazard (including treatment eects depending ont
p
) for T m
. A regression equation for, say,T p jx;V p statesthatT p equals 1 p (exp ( log p (x) logV p +" p )),where 1 p (:) isthe inverse of p (:).
The fact that we specify the assignment of treatment by way of specifying the hazard rate of a durationdistribution (rather than by way of specifying the individual realization of the duration variable) implies that there is a random component inthe assignment that is by denition independent of all other vari-ables. This randomcomponent isrepresented by the term "
p
inthe \regression" equation (3) for T
p
. One may interpret this random component in terms of the randomnessof the outcome of T
p at tjT
p
t that remainsif the hazardrate at t isspecied.Consider anindividual who has not yet been given atreatment and who has not yet left the state of interest, at time t. Basically, in a small time interval[t;t+dt), the probability oftreatment is
p (tjx;V
p
)dt, and the probabil-ity of no treatment is 1
p (tjx;V
p
)dt. This is aBernoulli trial. Given the value of
p (tjx;V
p
)dt, itsoutcome iscompletely random.In practice, such randomness may re ect behavior of the institution that supplies or imposes the treatments, orit may re ectpurely randomexternal shocks.
Recallthatthedataideallyprovidei.i.d.observationsonrealizationsofT m ;I(T m > T p );andT p I(T m >T p
),givenx.ThelattertermindicatesthatifT p
iscompleted before the realization of T
m
then we also observe the realization of T p
, whereas otherwise we merely observe that the realization of T
p
exceeds the realization of T
m
.Toanalyze theidenticationweassumethatthe data actuallyprovideexact knowledgeof the whole jointdistribution of T
m ;I(T m >T p ); andT p I(T m >T p ), given x.
Identiability is a property of the mapping from the model determinants
i ;
i
m m p p m p
unique mapping from the domain to the data. The modelis non-parametrically identied if this mapping has an inverse, i.e. if for given data there is a unique set of functions
i ;
i
(i=m;p),Æ and Gin the domain that isable to generate these data (ofcourse, these data must bein the image of the mapping).
Notethatthis approachtreatsthe individual-specicunobserved heterogene-itytermsasrealizationsofrandomvariables.Intermsofpaneldataanalysis,this means that V
m ;V
p
are treated as \random eects" when estimating the model with, for example, Maximum Likelihood.An alternative approach treats the in-dividual values of V
m ;V
p
as unknown individual-specic parameters (or, equiva-lently, as\incidental"parameters or \xed eects").
AVdB demonstrate that Model 1 (as characterized by the functions m , p , m , p
, Æ, and G) is non-parametrically identied from single-spell data. The result implies that the treatment eect is identied without exclusion restric-tionsorparametricfunctionalformrestrictionsonthedistributionofunobserved heterogeneity.
We nowturn tothecase wherethedata covermultiplespellsofanindividual inthestate ofinterest,i.e.if the dataare \multiple-spell"data.Weassumethat anindividualhasaxedvalueofV
m ;V
p
.Foragivenindividual,thedierentspells provide independent drawings from the jointdistribution of the T
m ;I(T m >T p ); and T p I(T m >T p ), given x;V m and V p
. The extension to more than two spells is trivial. Also note that the setup includes cases in which physically dierent individualsshare the same value of V
m ;V
p
and weobserve one duration for each of these individuals.Such a group of individualsis usually called astratum.
Since V m
and V p
are unobserved, the durationvariables given x are not inde-pendent across spells. In fact, any stochastic dependence across spells can only be due to the presence of heterogeneity. It is ruled out by assumption that real-izations of T
m or T
p
in one spell aect the distributions of durations in another spell.This may be a strong assumption in some applications. For example, par-ticipation in a training program for the unemployed may have an eect on the durations of future unemployment spells.
Compared to Model 1, the AVdB model for multiple spells is much more exible. It allows for interaction between t and x in the individualhazard rates, sothat the MPHstructure isrelaxed substantially.It alsoallowsfor dependence of V
m ;V
p
in the in ow on x, and it does not require E(V) < 1 anymore. It is allowed that the individual hazard rates in the second spell depend on t;x in a dierent way than they do in the rst spell. The size of the treatment eect may also be dierent across the two spells. The individual values of x in
still requires that the treatment eect and the unobservables aect the hazard rates multiplicatively. AVdB demonstrate that this modelis non-parametrically identied from multiple-spelldata.
3 Comparison to latent variable models with bi-nary treatment assignment
We start this section by considering identication of a treatment eect in stan-dard latent variable models with binary treatment assignment (see Maddala, 1983,andWooldridge,2002,foroverviews).Werestrictattentiontothe relations between the mean ofthe endogenousvariablesontheone hand and the explana-toryvariableson the other. This is inline with the spirit inwhich these models are interpreted (see e.g. Heckman (1990) for identication results based on full information). Consider Y =x 0 Y Y +x 0 Y Æ 0 I(Z >0)+" Y Z =x 0 Z Z +" Z (5) where Y, Z, " Y and " Z
are continuous random variables,Z > 0 indicates treat-ment, I(Z >0)isa binarytreatmentindicator, and x
0 Y
Æ 0
is the treatment eect. Weassumethat inferenceisbasedonarandomsampleof subjectswith informa-tion on Y, x
Y , x
Z
and I(Z >0) for each subject. Furthermore, x :=(x Y ;x Z ) is independent of " Y ;" Z
. We take the parameter Z
to be identied from the data on I(Z > 0) and x
Z
. It is generally acknowledged that either exclusion restric-tions(statingthat somecovariatesinx
Z
withnon-zero parametersin Z
are not includedin x
Y
) orparametricfunctional-formassumptionsonthe joint distribu-tion of ("
Y ;"
Z
) are required for identication of the treatment-eect parameter Æ
0
.Toillustratethis,suppose one aimstoidentifyÆ 0
fromthe dierencebetween the means of [YjZ > 0;x] and [YjZ 0;x], as a function of x := x
Y ;x Z . We have that E(YjZ >0;x) E(YjZ 0;x)= x 0 Y Æ 0 +E(" Y j" Z > x 0 Z Z ;x Z ) E(" Y j" Z x 0 Z Z ;x Z ) (6) Ifwe donot makeparametric functionalformassumptions onthe joint distribu-tionof"
Y ;"
Z
(6) can be linearin x Z
Z
. Ifinadditionwedonot makeanexclusion restriction then obviouslyÆ
0
is not identied from this linear expression.
Nowsuppose thatwedonotobserveY butonlyI(Y >0).Thisisthe\binary treatment { binary outcome" specication. For example, Z > 0 may indicate whether an unemployed individual has participated in a training program, and Y > 0 may indicate whether he has found ajob. The binary specication for Y eectivelyreduces theinformationin the data,soidenticationneedsat least as many assumptions as above(see Cameron and Heckman, 1998, for results).
Before we contrast the identication results, it should be noted that using latent variable models with binary treatment assignment or discrete-time panel data models when Y is in fact a (function of a) duration variable leads to a numberofintractablepracticalproblems.First,itrequiresaggregationovertime. Often, it is recorded whether a treatment occurs in a baseline period, and it is recorded whether the outcome (i.e. the duration variable) is realized in the subsequent period. Then, the question arises what to do when the treatment and the outcome occur in the same period. In addition, it is not clear how to deal with observations that are right-censored before the end of the observation window. It is common that a spell in a state of interest can end in dierent ways. Usually, only a subset of these are deemed interesting. For example, an unemployment spellcan end becauseof atransitiontowork, but alsobecauseof a transition into education, militaryservice, etcetera. A study of the transition rate to work may treat transitions toother destinations as independently right-censoring the durationuntil work. The treatment eect estimate may be biased ifsuchobservationsare discarded orifthey are treatedasobservations ofexit to work.
Now letus consider the similaritiesand dierences between the above model and the duration model of Section 2 in its regression representation (3-4), and thecorrespondingidenticationresults.Toshapethoughts,onemay,intheabove latent variable model, interpret Y as logT
m and Z > 0 as T p < T m . Of course, alternativeinterpretationsarepossible,andeachof themisimperfect.Moreover, ifModel1is thetrue modelthen ingeneraltheparameter Æ
0
inthe abovemodel specication isa complicatedfunction of all parameters of Model1.
The mostnotablesimilaritybetween the modelsconcerns the proportionality assumptions in Model 1 and the additivity assumption in regression equations. Both impose some \smoothness" by excluding certain interactions of time and explanatoryvariablesattheindividuallevel.TheMPHspecicationdoesimpose more structure than a regression specication, because the former decomposes the \error term"into two terms.
modelandthedatausedforModel1concerns thefactthatthelatterincorporate thetimingofthetreatmentwhereastheformerdonot.Atthesametime,themost fundamental dierence between the identication result for the latent variable model and the identication result for Model 1 concerns the fact that the latter does not need exclusion restrictions while the latter even allows the treatment eect tohave atime dimension.This suggests that the timingof events provides potentially very useful informationonthe treatment eect.
We shed more light on this by returning to Model 1. A single observation is equivalent to an observation of a realization of minfT
m ;T p g;I(T m > T p ) and T m I(T m > T p
) given x. The pair minfT m ;T p g;I(T m > T p
) is called the identi-ed minimum of T
m ;T
p
. The data and the model can be decomposed into two parts:the identied minimumgiven x,and the durationfromT
p untilT m if T p is the rst to be realized. Let us examine the part of the modelthat species the distribution of this identied minimum given x. This is in fact the well-known competing risksmodelwith dependent risks, wherethe dependence runs by way ofrelatedunobserved heterogeneityterms (seeHeckmanand Honore, 1989, Lan-caster, 1990). Notethat the specication of the competing risksmodeldoes not depend onÆ,whichisintuitivelyobvious. Heckmanand Honore(1989)showthat under anumberof assumptions, the dependent competing risks modelis identi-edfromsingle-spelldataontheidentiedminimumandx.Soifthedataprovide exact knowledgeof the jointdistribution ofthe identied minimum given xthen m ; p ; m ; p
;andGcanbededuced. Ofcourse,weobservemore thanthe iden-tied minimum and x, namely T
m I(T
m > T
p
). The latter, which is intimately linked to the time distance between the moment of outcome and the moment of treatment, can be used to identify Æ(tjt
p
;x). Intuitively, one can compare the actualdistribution of this time distance tothe distributionthat would prevail if Æ=1.The latter distribution isidentied fromthe competing risks model.
To clarify this further, we compare the observable exit rates m
att of those who are treated exactlyatt tothose who are not yet treatedat t:
log m (tjT p =t;x) log m (tjT p >t;x) Thereholds that
m (tjT p =t;x)= m (t) m (x) Æ(tjt;x) E(V m jT m t;T p =t;x) m (tjT p >t;x)= m (t) m (x) E(V m jT m t;T p >t;x)
log m (tjT p =t;x) log m (tjT p >t;x)= (7) logÆ(tjt;x)+logE(V
m jT m t;T p =t;x) logE(V m jT m t;T p >t;x) The selection eect logE(V
m jT m t;T p = t;x) logE(V m jT m t;T p > t;x) is identiedfromthecompetingriskspartofthemodel,thatis,frominformationon events in [0;t). In words, we know the average \quality" (V
m
) among treatment andcontrolgroupsatt fromthecompetingrisks partof themodel. This enables identication of Æ(tjt;x). A given treatment works and onlyworksfrom the day the subject is exposed to its possible eects, whereas unobserved heterogeneity aects the observed exit rate out of the current state fromday 1.
3
The above results based on the partitioning of model and data reinforce the conclusionsfromthecomparisonbetweenthebinarytreatmentapproachandthe durationmodel approach. First of all, the timing of events conveys useful infor-mation on the treatment eect. Secondly, the MPH specication is important, because this is assumed for the identicationof the competing risks model (see alsoAbbring and Vanden Berg, 2001a,for intuition behind the identicationof the competing risks model).
4 Comparison to panel data models Consider the dynamic panel data model
W t =x 0 W;t W +x 0 W;t Æ 0 I(Z t >0)+V W + W;t Z t =x 0 Z ;t Z +V Z + Z ;t ; (8)
where the index t denotes time. We deliberately introduce new notation W t
for theoutcome becausebelowwe adopttwodierentinterpretationsofW
t
interms of duration outcomes. Inference is based on a random sample of subjects with informationonW t , W t 1 , x W;t , x W;t 1 ,x Z ;t , x Z ;t 1 ,I(Z t >0)and I(Z t 1 >0)for each subject. We take
j;t
to be i.i.d. across time and across j = W;Z, so that 3
In most econometric evaluation approaches, a treatment eect can only be identied if someofthevariationin theassignmentoftreatmentisnotfullycollinearwiththevariationin theother determinants ofthe outcomeof interest.In ourframework,thisis taken careofby therandomcomponent"
p
in the\regression"equation(3) forT p . Thevariation in " p aects T m
runs by way of the relation between V W and V Z . We take E[ j;t jx j ] = 0, where forsake ofbrevity weuse x
j :=(x j;t ;x j;t 1 ).Wecan treatV W as axed eect by taking rst dierences of W t across time, E[W t W t 1 jZ t 1 0;Z t >0;x Y ;x Z ]=(x W;t x W;t 1 ) 0 W + x 0 W;t Æ 0 : (9) Clearly, Æ 0 isidentied. Nowsupposethat
W
variesovertime.Theequivalentoftheright-handsideof equation(9)equalsx 0 W;t W;t x 0 W;t 1 W;t 1 +x 0 W;t Æ 0 .Asaresult,Æ 0 isunidentied fromE[W t W t 1 jZ t 1 0;Z t >0;x W ;x Z ].A\dierence-in-dierences"however gives E[W t W t 1 jZ t 1 0;Z t >0;x W ;x Z ] E[W t W t 1 jZ t 1 0;Z t 0;x W ;x Z ]=x 0 W;t Æ 0 : (10) so Æ 0
is again identied. Note that this does not require the specication of a modelequationfor the treatmentassignment Z
t . Moreover, Z t and X W;t may be dependent, and the method even works in the absence of observed covariates. Also,clearly, Æ
0
can beallowed todepend on t.
We can translate the dynamic panel data model towards our model in two ways.Wecaneither(i)letW
t
beasurvivalindicatorforasingleoutcomeduration T m and set W t = 1 () T m
<t, or (ii) relate each W t
to a separate outcome duration, say T
m;t .
Option(i)suggestsalinkbetweenoursingle-spelldurationmodelanda multi-period panel data model. In fact, because a duration variable has a time unit, single-spelldurationmodels areoftenthoughttobesimilartomulti-periodpanel datamodels.However, this apparentsimilarityisdeceptive.Inthe caseof (i),we onlyobserveonedurationoutcomeperindividual,soW
t 1
=1=)W t
=1,which isnotnecessarilysointhegeneralcaseofspecication(8).Thelatterspecication allows for more variation in W
t and W t 1 given Z t 1 ;Z t ;x W ;x Z . This variation distinguishespaneldatamodelsfromour durationmodel. Asaresult,we cannot apply (9)and, in particular, wecannot dodierence-in-dierences likein(10).
Nevertheless, the resultsat the end ofthe previous section(see equation (7)) suggest that an approach in the spirit of the dierence-in-dierences approach might be possible, provided that one can usefully compare dierent sets of in-dividuals across time, and provided that the treatment assignment process is specied. In Abbring and Van den Berg (2001b) we develop such an approach.
dierences.
The point ofdeparture inAbbring and Vanden Berg(2001b) isthe observa-tion that dierence-in-dierences amounts to an examinationof a specic inter-action term: the extent to which the outcome over time diers between treated andcontrols.AbbringandVandenBerg(2001b)focus onthe lograteatwhicha treatmentisgiven conditionalonthe momentof exit(log
p (tjx;t m )witht<t m ) andexaminewhether this behaves dierently asafunctionoft whent
m
is dier-ent.Intuitively,if Æ>1thenmanyofthose who\die"att
m received atreatment shortlybeforet m ,so p (tjx;t m
)willtend toincrease shortly beforet m ,relativeto p (tjx;t m )forlargert m
.Thisamountstoexaminingtheinteractiontermbetween themomentoftreatmenttandtheoutcomet
m
inlog p
(tjx;t m
).Itturnsoutthat thisapproachallowsone todistinguishbetween acausaltreatmenteect and se-lectivity.Intuitively, iftreatmentand outcome aretypicallyrealizedvery quickly after each other, no matter what the values of the other outcome determinants are,thenthisisevidenceofapositivecausaltreatmenteect.The selectioneect doesnot giverise to thesame typeof quick succession ofevents.The xvariables play norole here. However, Æ is assumed to beconstant over t and t
p .
This result again illustrates the usefulness of the information in the timing of events to assess the treatment eect. Both in the panel data approach and in ourdurationmodelapproach,the treatmenteect works fromaspecic pointof time onwards, whereas the selection eect works at all points of time in a more permanentway. Inboth approaches, separabilityassumptions,ruling outcertain interaction eects of the determinants of the individual outcome of interest, are needed.Inthepaneldataapproach,additivityoftreatmenteectandunobserved heterogeneityintheoutcome equation(8) iscrucial.The previoussectionargued that in the duration model approach the additivity of the determinants of the individual log outcome hazard rate log
m (tjt p ;x;V m ) (equation (2)) is crucial. The results in this section so far emphasize that what is particularly crucial is theadditivityoftreatmenteectandunobserved heterogeneityinthisloghazard rate(althoughthisbyitselfseemstobeinsuÆcienttoidentifythewholeduration model includingthe way the treatmenteects varies with t and t
p ).
Theseseparabilityassumptionsattheindividuallevelenableanempirical dis-tinction between the treatment eect, that works from a specic point of time, andtheselectioneectthatworksatallpointsoftime.Thedurationapproachis muchmoreinvolvedbecauseofthedynamicnatureofselectivityinduration anal-ysis:as timeproceeds, the compositionof the survivors changes,so the selection eect changes.
towards our duration model framework. Let Z t
indicate whether a treatment is given during the t-th spell. This interpretation allows for a straightforward comparison with our multiple-spell duration model. In both cases we observe multiple outcomes for an individual with the same unobserved covariates. This suggeststhatwecanremovetheroleofunobservedheterogeneityinmultiple-spell data by some sort of conditional likelihood orrst-dierencing approach. AVdB develop ananalogueof the xed-eectpanel data estimatorbased on(10).Both estimators do not require any observed explanatory variables or a specication of the treatmentassignment process.
5 Conclusions
Variation in the durationuntil treatment relative to the durationuntil the out-comeof interest conveys useful informationon the causaltreatmenteect inthe presence of selectioneects. This informationis discarded ina binary treatment framework.Analysisofthedurationvariablesallowsforinferenceoncausal treat-menteects if novalidinstrumentsareavailableand ifconditionalindependence assumptions can not be justied. With single-spell duration data, this works as follows. If treatment and outcome are typically realized very quickly after each other, no matter what the values of the other outcome determinants are, then thisistaken asevidenceofapositivecausaltreatmenteect. Theselectioneect doesnot give rise tothe same type of quick succession ofevents.
Wemakeanumberofqualications.First,itispivotalthatindividualsdonot anticipate the realization of the moment of treatment, because then the treat-ment works from a moment in time that precedes the actual participation. It is obvious that this would lead to incorrect inference. See AVdB for an extensive methodologicalanalysis of this issue. Secondly, the information in the timing of events isuseful forinference onhowthe causaltreatment eect varies with time andwith the time since treatment. This provides potentiallyvery useful insights into the workings of the treatment, and this is important from a policy point of view. Thirdly, it is an important topic for further research to investigate to what extent the identication results for the single-spell duration model frame-workare robustwithrespect toseparabilityassumptionsembeddedinthe model framework.
The resultslead tosome suggestionsfor empiricalwork. First,itisuseful not to discard information on the timing of the treatment, and, in particular, not to round-o such information into a binary treatment indicator. Secondly, it is
Abbring, J.H. and G.J. van den Berg (2001a),\The identiability of the mixed proportionalhazards competingrisksmodel",Workingpaper, Free University Amsterdam, Amsterdam.
Abbring,J.H. andG.J.vanden Berg(2001b),\A simpleprocedureforinference ontreatmenteectsindurationmodels",Workingpaper,FreeUniversity Am-sterdam, Amsterdam.
Abbring, J.H. and G.J. van den Berg (2002), \The non-parametric identica-tionof treatmenteects indurationmodels", Working paper, Free University Amsterdam, Amsterdam.
Abbring, J.H., G.J. van den Berg, and J.C. van Ours (1997), \The eect of unemploymentinsurance sanctions onthetransitionrate fromunemployment toemployment", Working paper, TinbergenInstitute, Amsterdam.
Bonnal, L., D. Fougere, and A. Serandon (1997), \Evaluating the impact of French employment policies on individual labour market histories", Review of Economic Studies, 64,683{713.
Cameron, S.V. and J.J. Heckman (1998), \Life cycle schooling and dynamic se-lectionbias:ModelsandevidenceforvecohortsofAmericanmales",Journal of Political Economy,106, 262{333.
Card, D. and D. Sullivan (1988), \Measuring the eect of subsidized training programs on movements in and out of employment", Econometrica, 56, 497{ 530.
Gritz, R.M. (1993), \The impact of training on the frequency and duration of employment",Journal of Econometrics, 57, 21{51.
Heckman, J.J. (1990),\Varieties of selectionbias", AmericanEconomicReview, 80(supplement), 313{318.
Heckman, J.J. and B.E. Honore (1989), \The identiability of the competing risksmodel", Biometrika, 76,325{330.
Heckman,J.J.,R.J.LaLonde,andJ.A.Smith(1999),\Theeconomicsand econo-metricsof activelabormarket programs",in O.Ashenfelterand D. Card, ed-itors,Handbook of Labor Economics,Volume III, North-Holland,Amsterdam. Heckman, J.J. and C.R. Taber (1994), \Econometric mixture models and more generalmodels for unobservables induration analysis",StatisticalMethods in Medical Research,3, 279{302.
spells or time-varying covariates", Working paper, Northwestern University, Evanston.
Horowitz,J.L.(1999),\Semiparametricestimationofaproportionalhazardmodel with unobserved heterogeneity", Econometrica,67,1001{1029.
Lancaster, T. (1990),The Econometric Analysis of Transition Data, Cambridge University Press, Cambridge.
Lillard, L.A. (1993), \Simultaneous equations for hazards", Journal of Econo-metrics, 56,189{217.
Lillard, L.A. and C.W.A. Panis (1996), \Marital status and mortality: The role of health",Demography, 33, 313{327.
Maddala, G.S. (1983), Limited-dependent and qualitative variables in economet-rics,Cambridge University Press, Cambridge.
Ridder,G. (1990),\The non-parametricidenticationof generalizedaccelerated failure-timemodels",Review of Economic Studies,57,167{182.
Van den Berg, G.J. (2001), \Duration models: Specication, identication, and multiple durations", in J.J. Heckman and E. Leamer, editors, Handbook of Econometrics,Volume V,North Holland,Amsterdam.
Van den Berg, G.J., B. van der Klaauw, and J.C. van Ours (2004), \Punitive sanctions and the transition rate from welfare to work", Journal of Labor Economics,22,forthcoming.
Wooldridge,J.(2002),Econometricanalysisofcrosssectionandpaneldata,MIT press, Cambridge.