This section discusses the empirical framework for modeling the impact of health and lifestyle habits on entries into and exits out of disability.
6.4. Methodology 175
6.4.1
Mixed proportional hazard model
The focus of this study is on the hazard rates corresponding to self-employed workers’ entries into and exists out of disability. Hazard rates are particularly suitable to deal with censored and truncated data. We are interested in the incidence-of-disability rate and the recovery-from-disability rate. We will henceforth refer to these hazard rates as the ‘incidence rate’ and the ‘recovery rate’. The incidence rate is based on self- employed workers’ duration until the first disability, while the recovery rate is derived from the disability durations. Both durations are potentially right-censored. We are mainly interested in the way these two hazard rates are affected by risk factors related to health and lifestyle habits.
The disability rate reflects the risk of becoming disabled at time t, conditional on no disability until time t. We use a mixed proportional hazard (MPH) model to specify the disability rate. The MPH model has been shown to be particularly suitable for modelling durations in economics and has been widely used in other studies in the field (Van den Berg, 2001). The MPH model is of the form
λD(t| X, v) = λD0(t) exp(X′βD+ v), (6.1) where X is a KD-dimensional vector of observed covariates (where D stands for ‘dis- ability’), βD a vector of coefficients of the same dimension, and λD0(·) the baseline hazard. Moreover, v reflects individual-specific unobserved heterogeneity, which can be interpreted as a function of unobserved explanatory variables (Van den Berg, 2001).
Similarly, the recovery rate reflects the rate of recovery at time t, conditional on no recovery until time t. We also use a MPH model to specify the recovery rate:
λR(t| Z, w) = λR0(t) exp(Z′βR+ w), (6.2) where Z is a KR-dimensional vector of observed covariates (where R stands for ‘recov- ery’) containing possibly (but necessarily) different covariates than X, βR a vector of coefficients of the same dimension, and λR
0(·) the baseline hazard. Moreover, w reflects individual-specific unobserved heterogeneity.
Throughout, we account for unobserved individual-specific heterogeneity to capture any omitted variables related to e.g. education, risk aversion, and individual workplace heterogeneity. Unobserved heterogeneity, when not taken into account, affects the shape of the baseline hazard function and may lead to a downwards bias in the estimated model coefficients and duration dependence; see e.g. Kalbfleisch and Prentice (2002). More details about unobserved heterogeneity will be provided in Section 6.4.2.
176 Chapter 6. Health, Lifestyle and Disability Transitions of Self-Employed Workers the assumptions listed in Van den Berg (2001); see also Honoré, 1993 for MPH models. In the single-spell case these assumptions include the requirement that there is at least one continuous covariate and that the unobserved heterogeneity is independent of the covariates.
Health and lifestyle conditions are likely to impact differently on the hazard rates associated with different disorders. For example, a poor mental health is likely to have a large impact on the occurrence of mental problems and a less substantial effect on physical impairments such as fractures. We therefore estimate a competing risks version of the MPH model for disability incidence, distinguishing between different disorders causing the disability (e.g., Markussen et al., 2011). We therefore adopt an independent competing risks approach. The hazard rate corresponding to the m-th competing risk equals
λCm(t| X, u) = λm0 (t) exp(X′βCm+ u), (6.3) where βm
C is a vector of coefficients of dimension KD, λm0(·) the baseline hazard and u the individual-specific unobserved heterogeneity.
6.4.2
Unobserved heterogeneity
We follow the standard approach in the economic literature to model dependence among multiple durations and assume that, conditional on the observed covariates and the un- observed heterogeneity, a policyholder’s duration until the first claim is independent of any subsequent disability spells (Van den Berg, 2001, Section 8.1). Similarly, we assume that, conditional on the observed covariates and the individual-specific unobserved het- erogeneity, multiple disability durations corresponding to the same policyholder are independent. Conditional on the observed covariates only, however, durations of the same policyholder are dependent due to the related unobserved determinants. Note that the relation between the durations of the same policyholder is spurious to the extent that it only follows from the unobserved heterogeneity. Durations of different policyholders are independent.
Throughout, we assume a Weibull baseline hazard (allowing for constant, increasing, or decreasing duration dependence) and a discrete Heckman-Singer frailty distribution with an endogenous number of mass points (Heckman and Singer, 1984). Heckman- Singer frailty is a non-parametric way of modeling unobserved heterogeneity. The frailty probability distribution in the incidence model satisfies IP(v = vi) = pi, where ∑N
i=1pi= 1 in case of a Heckman-Singer frailty distribution with N mass points (where the appropriate value of N is determined by the Heckman-Singer procedure). Later we will run robustness checks using partial likelihood and gamma frailty, while leaving the
6.4. Methodology 177 baseline hazard unspecified.
We estimate the hazard rates for disability incidence and recovery separately using marginal maximum likelihood (ML), resulting in estimates of βD, βR, λD0(·), λR0(·) and the marginal frailty probability distributions f (v) and g(w). For the independent competing risks approach we estimate separate MPH models for the various competing risks, yielding estimates of βm
C, λ
m
0(·) and the frailty distributions hm(u), for each m. The appendix provides explicit expressions for all log-likelihood functions estimated in this study.
Additional efficiency can be achieved by joint ML estimation of the incidence and recovery rates as specified in Equations (6.1) and (6.2), resulting in a multivariate MPH model (Van den Berg, 2001). Joint estimation requires explicit assumptions about the specific form of the joint frailty probability distribution k(v, w). For example, one can assume two shared frailty terms in both the incidence and recovery rates, combined with a two-factor loading (Van den Berg, 2001). Because the benefit of additional efficiency due to joint estimation is offset by the need for additional distributional assumptions and the substantial increase in computational complexity, we confine our main analysis to marginal ML estimation of the incidence and recovery rates.
6.4.3
Selection effects
Our insurance portfolio is potentially subject to health-related selection into self- em- ployment (see e.g. Rietveld et al., 2013), adverse selection and risk-selection by the insurance company. Furthermore, the policyholders are relatively young, with an av- erage age of 35 years upon buying income insurance. Because the insurance portfolio consists entirely of relatively young self-employed with disability insurance, it is not possible to disentangle, analyze or control for the aforementioned selection effects. Our analysis is conditional on the selection, which is a common feature of studies using insurance data (e.g. Spierdijk et al., 2009; Spierdijk and Koning, 2014). A conditional analysis like ours is still relevant from an insurance perspective, because it can con- tribute to more effective criteria for risk selection and underwriting, the development of risk-based insurance premiums for income insurance as described in Spierdijk and Koning (2014), prevention of disability among self-employed, and optimization of their return-to-work process.
We do control for selection with respect to the generosity of the insurance contract though. Our sample contains policyholders with different types of insurance contracts. The contracts may differ in terms of full/limited coverage, the length of the deferment period and the level of the replacement income. Since the different types of income insurance contracts are not randomly assigned to self-employed, significant unobserved
178 Chapter 6. Health, Lifestyle and Disability Transitions of Self-Employed Workers differences could arise between policyholders with different income insurance contracts (Hill et al., 2013). For example, self-employed with different deferment periods may also differ in terms of unobserved risk factors (Cox and Gustavson, 1995). Individuals choosing a short deferment period are possibly more risk averse than those opting for a longer waiting time (Spierdijk et al., 2009). Pooling policyholders with long and short deferment periods could therefore bias the estimation results. Throughout, we account for selection on the basis of policyholders’ unobserved risk factors by estimating separate MPH models for different subgroups of policyholders, such as those with limited coverage or a specific deferment period.
Similarly, we estimate separate MPH models for self-employed with different lifestyle habits (such as smokers and non-smokers), for men and women, and for self-employed belonging to a specific occupational class. Estimating separate MPH models for different subgroups allows for group-specific unobserved heterogeneity and additionally permits the observed risk factors to impact differently on the incidence and recovery rates of different groups.
Another form of selection arises when policyholders leave the sample because they do not want to continue their insurance policy (lapsing) or because they die. We deal with this form of selection by marking the relevant durations as censored.