MULTILEVEL MODELLING FOR THE ECONOMETRIC ANALYSIS

This appendix is written as a companion to chapter 5; its purpose is to provide the reader with a broader understanding of multilevel modelling as applied to the econometric study of bed occupancy and costs. The techniques have been described in more detail elsewhere (Goldstein, 1995) and with particular reference to the health service (Rice and Leyland, 1996) and health economics (Rice and Jones, 1997). This appendix is limited to an explanation of those issues relevant to chapter 5, drawing examples from that chapter.

The uses of multilevel modelling

The hierarchy evident in chapter 5 is that of repeated measures made on individual hospitals. At the lowest level we have one financial year of data for each hospital, there being a total of five observations per hospital (corresponding to the five financial years from 1991/92 to 1995/96 inclusive). The response variable in each model is a measure of cost per case, be this total cost per case or direct cost per case. At the top level of the two level model are the 26 hospitals included in the analysis (there being five observations made on each of them). If the hospital is denoted by the subscript i and the year by the subscript j, then the response variable is the cost per case for the jth_{year in the}_ith_hospital

_y

ij (i = 1,…,26; j = 1,…,5). We are attempting to explain

differences in costs – both from one year to another and from one hospital to another – using available information. This may relate to the hospital over the entire period; an example of such a variable is teaching status which remains unchanged from one year to the next. Alternatively this may vary from year to year within the hospital; an example of such a variable is average annual bed occupancy.

There are a total of 127 data points (5 observations on 25 hospitals and only two observations on one hospital). There are a number of possible approaches to the analysis of these data, and we can consider the alternative approaches of ordinary least squares, aggregation, a fixed effects model and a multilevel model. The ordinary least squares (OLS) approach is to analyse the data as they stand. This, however, ignores the hierarchy in the data of the repeated measures being made on the same hospitals – the analysis would be just as if a different 26 hospitals had been sampled each year. We would expect there to be some relationship between the cost per case for a particular hospital for two different years; a hospital with high costs for one year is more likely to have high costs every year, which may reflect something as simple as differences in its accounting procedures. The point is that

y

ij and

y

ij' cannot be assumed to be independent of each

other – a key assumption of OLS regression techniques. Furthermore, consider the variables which are “measured” (or relate to) the hospital level. We have 26 observations relating to teaching status, one made on each hospital (with a 1 indicating that it is a teaching hospital, a 0 that it is not). An OLS regression model acts as though there are 120 such observations rather than 26, since there is no notion that the same hospitals are appearing more than once in the data. In practice this leads to an underestimation of the standard error of the regression coefficients of such hospital level variables – a problem known as “misestimated precision”. This means that such hospital level variables are likely to be thought to have a significant relationship with the response variable when, in fact, they do not.

An approach to overcome the problem of misestimated precision is to aggregate the five years into a single observation for each hospital. This means taking an average (or a weighted average) of all of the variables – response and explanatory – so that all variables are then measured at the

level of hospital rather than year. In essence, the

y

_ij are condensed into a set of y_i_. which represent the experience of hospitals over the five year period from 1991 to 1996. However, the data now comprise just 26 observations and so there is a considerable loss of power (and information) with such a large reduction in the number of degrees of freedom. What is more, it is no longer possible to look at changes over time; if the relationship between cost per case and bed occupancy is what is of paramount interest then it is important to check that this relationship has remained constant over the period rather than just reporting the mean relationship.

The third possibility mentioned above is to fit a fixed effects model. This essentially adds a dummy variable for each hospital into the regression equation as explanatory variables. The dummy relating to the first hospital, for example, would take the value 1 corresponding to responses where i = 1 and would take the value 0 otherwise. This then estimates a mean for each hospital rather than an overall mean; consequently it overcomes the problems of independence and misestimated precision. However, there is a cost; the use of 26 separate means rather than one mean for all hospitals requires the addition of 25 explanatory variables and the loss of 25 degrees of freedom. Investigating whether any of the explanatory variables was random across hospitals – for example, whether the relationship between bed occupancy and cost was constant across all hospitals or if it varied between them in terms of the change in cost per case arising from a single percentage point increase in bed occupancy – would again require the addition of 25 dummy variables. Such models become unwieldy very quickly.

Multilevel models differ from fixed effects models in that, rather than model the effect of each hospital explicitly, they model the variation separately at each level. The separate means for each hospital can then be estimated from these variances. They have the advantage over the fixed effects models that the modelling of 26 hospital means requires the addition of one parameter – the between hospital variance – as opposed to the addition of 26 explanatory variables. The distributional assumptions made about hospital means – that they can effectively be seen as being draws from a distribution of an infinite number of hospital means with a given variance – results in hospitals “borrowing strength” from each other. In the fixed effects model the estimate for each hospital’s mean is made from data for that hospital alone; in a multilevel model, data from all hospitals are utilised.

The fixed part

The fixed part of a multilevel model looks very much like any other regression model in that it may contain an intercept or constant, a number of explanatory variables which may be continuous or categorical, and interactions between these terms. Consider, for example, model A presented in table 5.3. The estimated coefficient of the constant, 765.5, represents the estimated cost per case (in pounds) when all of the other terms in the regression equation are zero. The variables 1991/92 to 1994/95 are dummy variables indicating the financial year; as such they are all zero for the financial year 1995/96 (this is called the baseline year). Similarly, the variables LGMTH and MSH are dummy variables indicating whether the hospital is a large teaching hospital or a mixed specialty hospital respectively. The baseline hospital type is the general hospital with some teaching units. AOR represents the average occupancy rate, expressed in terms of the difference in percentage points from the average of 84%. CRUDE LOS denotes the mean length of stay and is centred around an average of 6.2 days. The last two terms are interactions between the hospital type and length of stay, and are therefore both zero when either the hospital considered is a general hospital with some teaching units or when the hospital’s unadjusted length of stay is 6.2 days. The figure of £765.5 therefore refers to the predicted direct cost per case in

the medical specialties of a general hospital with some teaching units, in the 1995/96 financial year, which has an average occupancy rate of 84% and a mean length of stay of 6.2 days. Predicted costs for different types of hospital, different financial years, varying occupancy rates and with differing mean lengths of stay can be obtained by adding multiples of the relevant variables to this baseline figure.

Accompanying the parameter estimates in table 5.3 are their standard errors. Thus, whilst the cost per case for 1991/92 was estimated to be £182.9 lower than in 1995/96 for hospitals with the same occupancy and the same mean length of stay in the two years, we can be reasonably confident that costs were between £236.1 and £129.7 lower.

The random part

An OLS regression model uses a set of predictor or explanatory variables measured at the year level for each hospital,

{x x

0ij

,

1ij

,...,x

Pij

}

, to estimate associated regression coefficients

{β β

,

,...,β

}

and thereby obtain a predicted cost

y$

$

In document BED OCCUPANCY AND BED MANAGEMENT (Page 133-136)