• No results found

Chapter 4: Model Structure and Analytical Issues

4.5 Estimation Methods

There are a range of methods which may be applied to a panel dataset with a system of equations such as that outlined above. The approach to the system of equations and endogeneity is considered first, the approach to the panel structure second, and then issues of weighting of estimates are considered.

4.5.1 Allowing for Endogeneity

While modelling of systems of equations with high levels of endogeneity can be addressed with a full information maximum likelihood (FIML) approach, as was done by Connelly (1999), 3SLS approaches are close equivalents to FIML (Greene (2003, p. 407) and more readily specified and calculated. Further, while 3SLS is more efficient than systems 2SLS, it is less robust to specification error (Greene 2003, p. 413;

Wooldridge 2002, p. 198) so in the interests of flexibility and robustness to specification error, the preferred approach was to model equation by equation using 2SLS. In

particular, with the highly interleaved set of variables required in general practice, it was not appropriate to use the same instrument sets in all situations, and estimating each equation separately using 2SLS allowed more targeted specification of individual equations (Wooldridge 2002, p. 237).

When 2SLS models exhibit high levels of heteroskedasticity, it is more efficient to use generalised methods of moments estimators (GMM) (Wooldridge 2002, p. 193; Baum,

Schaffer et al. 2003). Preliminary analysis showed high levels of heteroskedasticity, so

GMM modelling has been used for all instrumental variable estimation.

4.5.2 Panel Modelling

A major advantage of the study dataset was that it permitted the use of panel analytical techniques which should eliminate that part of the unobserved heterogeneity which was fixed over time. There has been little use of panel analysis with aggregate data in the study of physician markets internationally, except in examining mortality (Ruhm 2005), and none in Australia.

Early studies on the nature of the physician market, including studies of SID (Evans 1973) were based on observations regarding changes in the behaviour of the market over time, and there has been considerable research into SID using natural experiments with varied results (e.g. Nassiri & Rochaix 2006). International research with aggregate data from multiple time periods (Fuchs 1978; Cromwell & Mitchell 1986), however, has generally pooled the data, and either ignored the time dimension or used time-trend variables.

While there are numerous international studies of demand and of supply which have used panel techniques (Delattre & Dormont 2003; Pohlmeier & Volker 1995; Baltagi, Bratberg et al. 2005), these were generally at the individual physician level rather than

using aggregated data.

The main question with the panel analysis was whether fixed or random effects were appropriate. There are a number of unobserved factors which may have influenced the demand equation, including perceived quality of care, use of tobacco and alcohol by the population, and clustering of GPs in transport hubs or where accommodation for their surgeries can be obtained at reasonable prices. As most of these were likely to be correlated with the socio-economic status of the area, which was included in the demand equation, a fixed effects modelling approach was appropriate (Verbeek 2000, p. 319).

all equations. The fixed effects modelling has the disadvantage that the impact of variables of interest in understanding the behaviour of the GP market which are constant over time (e.g. state indicators and urban/rural indicators), are eliminated with the other fixed effects.

The main analysis in this thesis is based on panel data. The initial set of equations is also re-estimated with cross-sectional data for 2001 to allow those fixed effects which can be measured to be examined, and to permit comparison with previous analyses.

4.5.3 Weighting of the Estimation

The SLAs which form the basis of the data structure, ranged in size from approximately 1,000 to approximately 190,000 people, and from 0.66 to 312 FWE–headcount GPs. This leads to the question of whether the estimation should be weighted to allow for size effects of the SLAs.

The general literature on weighting was indeterminate, and most of the literature in the modelling of GP markets is simply silent on the topic. Kmenta (1986, p. 368) and Baum (2006, p. 153) suggested weighting by population size due to concerns with heteroskedasticity, concerns with the relative importance of the larger and smaller areas, and concerns that small areas would become points with undue influence. Dickens (1990) showed that weighting may magnify the problems it was designed to overcome if the individuals in a group share a common unobserved characteristic. The use of fixed effects estimation meant that common characteristics across a group were eliminated, so this issue should not arise.

In the context of panel analysis with grouped data, Deaton (1985) used a weight based on cohort size to correct for heteroskedasticity, as did Gardes, Duncan et al. (2005),

although their actual weight was more complex. Dumouchel & Duncan (1983) noted that where estimated coefficients differ across strata with different sampling weights, one solution was to estimate separately for each stratum without weights and take the weighted sum of the coefficients.

Use of GMM estimation should accommodate heteroskedasticity issues. However, the relative importance of areas of different size will lead to bias if relationships between the variables differ between larger and smaller SLAs. This is likely, as SLAs with large populations tend to be in metropolitan areas and the SLAs with smaller populations in rural areas.

To test the effects, the basic demand equation to be presented in Table 6.1, in

unweighted form and including only the cross-sectional data for 2001, was estimated separately for the SLAs above and below the median size of 12,000 people. Table 4.3 shows that there are significant differences in behaviour between the small and large areas, suggesting that it is necessary to either weight for the size of the SLA, or to include interaction terms for the regional effects.

Table 4.3: Demand coefficients in small and large SLAs (unweighted, 2001)

Number of SLAs Coefficient52

Net fee GP density

Small SLAs (<12,000 people) 413 -0.125 (0.103) 0.019 (0.097) Large SLAs (>12,000 people) 413 -0.309*** (0.042) 0.914*** (0.156)

While it was possible to estimate regional interaction terms, as the effects were likely to encompass more than just the principal variables, estimation of the regional interaction terms would require a large number of interactions to be estimated, many of which would be endogenous and would require instrumentation. Further, while SLA size is highly correlated with region, it is not the same as region. Weights have therefore been used.

Population size was the appropriate weight for the demand equation as the dependent variable was services per capita. Other equations (e.g. services per GP) used different denominators, so to ensure correct averages could be predicted, the denominator of the relevant dependent variable was used as the weight in each equation.

4.5.4 Summary

Following these considerations, the approach to estimation of the structural equations was to separately estimate each equation of interest at the SLA level using fixed effects panel modelling with GMM (2SLS). Modelling was undertaken within STATA using the IVREG2 and XTIVREG2 packages. In estimating each equation, observations were weighted according to the size of SLA.