The econometric model - Data and Method - An analysis of the determinants and scarring effects

2.3 Data and Method

2.3.2 The econometric model

Several previous studies regarding the probability of being in a given labour market state at a certain point in time (state probabilities) have utilised the binary outcome of the probit model (for example, Cappellari et al. (2005) in the case of British older workers, Akhtar and Shahnaz (2005) in the case of youth unemployment in Pakistan, and Bell and Blanchflower (2015) regarding youth unemployment in Greece). This chapter utilizes the multinomial logit model to allow for independent variables to vary across more than two possible discrete alternative outcomes. These outcomes are based upon individual’s current labour market status in a given year, i.e. at the interview date in each survey wave. As discussed previously, the current labour market status information of individuals is disaggregated into four mutually exclusive categories: employment, education, unemployment, and inactivity. The first two outcomes are considered as Non-NEET, while the latter two are NEET. However, inactivity due to retirement will be omitted from our analysis, since it applies only to adults at their retirement age.

In modelling the probability of being in NEET and Non-NEET, let us define an individual labour market state as j = 0,…,3, where j = 0 if the current labour market status is ‘employment’, j = 1 if the individual is in education (including government training programmes), j = 2 for the current labour market status of unemployment, and j = 3 for those who are in inactivity. Moreover, individual i, where i = 1, …, n, represents the i-th individual in the sample who is observed in survey wave t (t = 1, …, 23) and is characterised by a latent probability of being in a particular labour market state j which is a function of a vector of covariates (X) and unknown coefficients to be estimated α, such that

33_{More specifically, Scotland and Wales boost samples were included since BHPS Wave 9, while a}

Pijt = Pr [yit = j] = Fjt (Xit, α ) (2.1) where yit is the random variable describing the labour market state of individual i at wave t; X represents a vector of covariates or independent variables that consist of personal and household characteristics. The functional form for Fjt should be such that probabilities lie between 0 and 1 and sum over j to one (Cameron and Trivedi, 2005 pp. 496).

The probability, p, of the i-th individual being in state j at time t is thus can be written as

p_𝑖𝑗𝑡 = Pr [y_𝑖𝑡 = 𝑗] = exp(𝛼𝑗 𝑋𝑖𝑗𝑡)

∑3_𝑘=0exp(𝛼_𝑘 𝑋_𝑖𝑘𝑡)

(2.2)

where subscript j or k denotes the alternatives, in this case the individual’s labour market state, which takes the value of 0 for employment, 1 for education (or training), 2 for unemployment, and 3 for inactivity. Since the probabilities must sum to one, a normalization of parameters is needed. In here, the labour market state of employment (j = 0) is set as the base category group, such that the coefficients for this labour market state are normalized to zero (𝛼0 = 0). The independent variables (X) include age, gender, educational background, ethnicity, marital status, health condition, housing type, housing tenure, number of owned children, region of residence, and different business cycle periods.

At this stage, we will for now neglect the dynamic nature of the labour market transitions between each wave and focus only on the probability of being served in a labour market state at a given point in time. Moreover, in this chapter we still have not taken into account the start and end dates of the labour market status, except for data checking purposes as discussed in previous section; hence, the length duration of each labour market state is ignored.34_{As a consequence, not only do we allow for the same}

individual to appear more than once in our sample, but a person’s current economic activity or labour market status in wave t could also be the same as his or her status in the previous wave(s). When fitting the multinomial logit regression model, we use the relevant cross-sectional weight for each year. In addition, to control for repeated

34_{The time duration estimation will be examined in the last empirical chapter using the survival analysis}

observations of the same person over time, the standard errors are estimated using the adjusted robust standard errors for clustering of individuals based on the individual’s unique identification (pid) number.

In order to answer our research questions, our estimations will first be conducted for males and females separately, in order to compare the impacts of independent variables between gender, then by each age group and region, in order to observe whether there are any significant age related and regional differences of the impacts of recessions on the individual’s NEET probability. In the estimations by gender and age group, the regional variable is treated as a dummy variable with 12 categories. Meanwhile, estimations by region will be conducted for the northern versus the southern regions of the UK with gender and age variables now included as a dummy variable.

2.3.2.1 Results interpretation: marginal effects

All results from our multinomial logit models will be presented in marginal effects (evaluated at the covariates’ mean values). In the multinomial logit model, marginal effects allow us to calculate the effect on the j-th probability of changing by one unit a regressor or independent variable that takes the same value across all alternatives (Cameron and Trivedi, 2005, pp. 502). Cappelari et al. (2005) explains that marginal effects measure the change (relative to a reference case) in the probability of being in a given state resulting from having a certain characteristic. For example, if we would like to know “what is the effect on the probability of being unemployed if the number of own children increases by one child?”, then according to Cameron and Trivedi (2005), in order to answer this question, from Equation (2.2) we can find

𝜕p_𝑖𝑗𝑡

𝜕𝑋𝑖𝑡

= p

𝑖𝑗𝑡 (𝛼𝑗𝑡− 𝛼

̅̅̅

𝑖𝑡)

(2.3)

where 𝛼̅̅̅̅_𝑖𝑡 = ∑ 𝑝𝑘 𝑖𝑘𝑡 𝛼𝑘is a probability weighted average of the αk. By estimating the

marginal effects, we can also obtain the values for the dependent variable’s base category. Thus, we will be able to directly interpret the impact of a certain regressor on the probability of being in a particular state, which in our case is either being employed, in education (or training), unemployed, or being inactive.

2.3.2.2 Likelihood Ratio Test

Utilizing the multinomial logit model as opposed to a binary logit allows us to distinguish the impacts of independent variables on different states of NEET and Non- NEET, i.e. either being employed or in education for Non-NEET and either being unemployed or inactive for NEET. Cramer and Ridder (1991) provides a solution to test whether a subset of states in a multinomial logit model can actually be treated as a single state or whether each state shows significant differences on their own. They argue that the introduction of a new outcome within state j, for example, will lead to an extended model with (J + 1) states, where j1 and j2 are the two new states being

substituted for j. They find that if the new distinction is arbitrary and irrelevant, the new model is again a multinomial logit, in which j1 and j2 have the same regressor

coefficients as those of their parent state, except that their intercepts differ. Therefore, they suggest that to test for the pooling states is to test for the equality of their logit regressor coefficients, apart from the intercept (Cramer and Ridder, 1991, pp. 269). In order to do this, we only need to apply the likelihood ratio test.

Following the work by (Cramer and Ridder, 1991, pp. 269-270), if we have a multinomial logit model with (J + 1) outcomes or states and define the two states, which are candidates for pooling, as j1 and j2, then the null hypothesis is that they all

have the same regressor coefficients other than the intercept. That is,

αj1 = αj2 = αj (2.4)

and the test statistics to test for this hypothesis is

LR = 2 {log 𝐿̂–log 𝐿̂_𝑅} (2.5)

where log 𝐿̂ is the maximum log-likelihood of the original model and log 𝐿̂𝑅 is the maximum log-likelihood if our estimations are constrained to satisfy (2.4). Log 𝐿̂ is usually readily available once we run the estimation of the original model with (J+1) states, whereas the value of log 𝐿̂_𝑅requires further estimation. This estimation is derived from the unconstrained estimation of the pooled model with only J states. The restricted maximum log-likelihood is estimated by

log 𝐿̂_𝑅= nj1 log nj1 + nj2 log nj2 + … + ∑ n_𝑘 _𝑗𝑘 log n_𝑗𝑘 – nj log nj + log 𝐿̂p (2.6) where log 𝐿̂p is the unconstrained maximum log-likelihood of the pooled model with only J states and n indicates the number of sample observations in each state such that nj1 + nj2 + … + njk = nj. In order to make a decision whether to reject or not reject the

null hypothesis, the likelihood ratio (LR) test follows the chi-square distribution with degrees of freedom equal to the number of restrictions implied by the null hypothesis given in (2.4).

In document An analysis of the determinants and scarring effects of economic inactivity and unemployment in the UK (Page 60-64)