Multiple Public Service Performance Indicators: Toward an Integrated Statistical Approach

(1)

Multiple Public Service Performance

Indicators: Toward an Integrated

Statistical Approach

Stephen Martin Peter C. Smith University of York

ABSTRACT

Public service organizations usually produce multiple outputs, measured on different scales, giving rise to a suite of performance indicators. The traditional approach to statistical analysis of organizational performance has been to develop a separate regression model for each performance indicator. This piecemeal approach, the article argues, may discard valuable information, as it ignores potentially important relationships between individual performance measures. We therefore propose modeling an organization’s performance measures simultaneously, using the methods of seemingly unrelated regressions. The approach implicitly introduces a latent organizational variable into the regressions and may therefore economize on the need to assemble explicit measures of organizational characteristics. The method is illustrated using an example from English public hospitals.

In most industrialized nations, measuring the performance of public services has assumed central political importance, as governments come under increased pressure to reduce taxation and ensure that tax revenues are spent cost-effectively. To that end, governments have put in place extensive systems for measuring the performance of public service organizations, with a view to improving accountability thorough improved political and managerial scrutiny of those organizations (Bird 2004).

Yet only a few years ago such performance data were sparse, selective, and slow to emerge. Now the revolution in information technology is rapidly leading to a situation in which public services are overwhelmed with indicators of activity and performance. In this new world of data overload, an emerging challenge is to turn the data into meaningful messages that are useful for informing both managerial decisions and democratic debate. Without technologies to address this difﬁculty, there is a risk that the superabundance of data will be used ineffectively.

In interpreting performance data, one of the most pressing concerns is often that public service organizations operate in different circumstances, and therefore direct We would like to thank the participants at the Cardiff Seminar, the referees, and our colleagues Katharina Hauck and Andrew Street. Smith is funded by ESRC grant R000271253. Address correspondence to Peter C. Smith at [email protected].

doi:10.1093/jopart/mui036

Advance Access publication on March 2, 2005

ªThe Author 2005. Published by Oxford University Press on behalf of the Journal of Public Administration Research

at Rutgers University on August 12, 2010

http://jpart.oxfordjournals.org

(2)

comparisons using crude performance data can be seriously misleading. Some mechanism is therefore required to adjust performance measures for the relative difﬁculty of attainment. A rudimentary, commonly adopted approach entails regressing performance measures against a variety of factors that are thought to affect performance, such as input resources and environmental context (Smith 1990). Typically, each performance measure will be modeled separately. This piecemeal approach—with the separate estimation of regression models for each indicator—implicitly assumes that there is no formal connection between the various regressions. The error terms from the regressions are then often used as indicators of organizational performance—that is, they indicate the extent of unexplained over- or underperformance after the appropriate mitigating factors (as reﬂected in the regressors) have been taken into account.

Researchers have also sought to apply more advanced statistical models to the new data resources (Heinrich and Lynn 2001). For example, important statistical advances have been made in the application of multilevel (hierarchical) methods and panel data methods to performance data (Goldstein and Spiegelhalter 1996). The most sustained effort has been in the ﬁeld of productivity analysis, where the methodology and application of nonparametric methods (most notably, data envelopment analysis) and parametric approaches (such as stochastic frontier analysis) have grown apace (Coelli, Rao, and Battese 1998).

The methods of productivity analysis vary in underlying assumptions and estimation methods, but their common purpose is to model the frontier of feasible performance. They then seek to offer a single measure of an organization’s relative efficiency by comparing its performance with the estimated frontier (Kumbhaker and Lovell 2003; Thanassoulis 2001). Such models clearly have great relevance to modeling public service performance. In particular, data envelopment analysis is able to accommodate multiple inputs and multiple outputs, and it has been deployed in countless empirical contexts (Hollingsworth 2003; Gattoufi, Oral, and Reisman 2004). Moreover, the underlying theory has been developed to a high level of technical refinement, and numerous methodological innovations have been reported.

However, a growing literature challenges the underpinnings of traditional pro-ductivity analysis (Smith and Street, forthcoming; Stone 2002). First, there are questions as to whether a single composite measure of organizational performance is manage-rially helpful. However, setting this philosophical concern to one side, there are also profound concerns with whether—in the absence of good knowledge of the production process—it is feasible to develop a convincing theoretical model of productivity. Results can be highly sensitive to modeling assumptions. In particular, a great deal often rests on how much of the unexplained variation in performance is attributed to inefﬁciency and how much to random noise or unmeasured environmental inﬂuences on perfor-mance. This leads on to the largely unresolved issue of what ‘‘uncontrollable’’ factors should be included in any empirical model as partial explanations for variations in performance.

This article offers an alternative perspective on modeling organizational performance. We believe that the dominant interest of public service managers is in indicators of performance in speciﬁc service areas, rather than aggregate measures of organizational performance. We therefore focus on individual indicators of performance. However, in contrast to the usual piecemeal regression approach mentioned above, our approach acknowledges the possibility that there may exist important relationships between

(3)

individual performance measures, and therefore it seeks to model the indicators simultaneously, using the methods of seemingly unrelated regressions.

The next section discusses in broad terms the general model of public service production underlying performance indicators and the traditional approach to analysis based on ordinary least squares (OLS) regression methods. We then introduce the methodology of seemingly unrelated regressions (SUR), which generalizes the estimation of systems of equations when (as with public service performance measures) the error terms are likely to be correlated. We illustrate the method with an example that uses three performance measures for public hospitals in England, and we conclude by discussing the general relevance of SUR methods to the analysis of public service performance.

THE UNDERLYING PRODUCTION MODEL

To illustrate our principles, consider a very general production process with just two individual indicators of organizational performance, as illustrated in ﬁgure 1. If all public service organizations are operating in identical environments, and using identical outputs, the frontier of feasible production can be illustrated by a single curve, such as FF. Then, all observations will lie on or inside this frontier. However, if—as will usually be the case— organizations vary in environment or resources used, the frontier will shift. For example, the frontier F2F2might indicate a revised frontier for a set of organizations operating in a more adverse environment. A reduction in resources might operate in the same way, implying (say) that the feasible mix of performance achieved by organization at point A might be reduced to point A2, if such a reduction were implemented.

Under this view, variations in the observed performance of two organizations might arise from ﬁve sources: environmental factors, resource levels, efﬁciency, substitution, and data quality. We consider these in turn:

1. The organizations might be operating in different environments, leading to variations in the feasible levels of performance. For example, schools might have different calibers of

Figure 1

The Production Possibility Frontier with Two Performance Indicators

(4)

student intake, or police departments might operate in different socioeconomic circum-stances. Environmental inﬂuences on performance are often the most poorly understood and poorly measured aspects of the public service production process. As environmental circumstances improve, so we would expect to observe improvements in all performance measures (albeit to varying extents). Therefore, such variations will in general give rise to a positive correlation between performance measures.

2. The organizations might be devoting different levels of resource inputs to the services under scrutiny. Variations in resources act in a similar way to variations in environmental factors to alter the capacity of the organization to secure good performance, but they are often better understood and measured. Improvements in resources potentially increase the capacity for performance in all dimensions, and so should also give rise to a positive correlation between performance measures.

3. Overall organizational efficiency is intrinsically unmeasurable, and yet performance indicators seek to offer insights into this elusive concept. Conventional productivity models focus attention on variations in overall efficiency, yet they run into a difficulty because it is impossible to distinguish between organizational effects caused by unmeasured resource or environmental variations and organizational effects caused by efficiency variations. Again, efficiency should be positively correlated with performance in all dimensions and therefore contribute to a positive correlation between performance measures. 4. If organizations are fully efficient, improved performance on one indicator can be secured

only at the expense of a worse performance on others, as the organization moves round the efficient frontier. For example, in figure 1, an efficient organization A can improve performance on indicator 1 only by reducing attainment on indicator 2 (moving, say, to point B). In contrast to (1) through (3) above, this substitution effect implies a negative correlation between performance measures.

5. Imperfections in data quality are inherent to all public services. These might affect relative measured performance in a variety of ways. For example, if the performance measures are of the form ‘‘attainment per head of population,’’ then an overestimate of population would adversely affect performance in all domains, leading to a positive correlation between performance measures. If, on the other hand, the performance measures are expressed (say) in the form ‘‘attainment in domain X per employee in domain X,’’ then an imprecise allocation of employees between the different domains of performance might lead to a negative correlation between performance measures. Data imperfections could therefore contribute either positively or negatively to correlation measures.

There are therefore numerous reasons why performance on one indicator might be correlated positively or negatively with performance on another. Of course, if we could identify and measure the factors listed under (1) through (5) above, we could model performance on any one indicator with some confidence. Instead, in many circumstances we have available a limited set of covariates with which we can adjust performance measures. In particular, for example, there usually exist measures that can serve to adjust performance measures to account for differences in physical or financial inputs. However, as discussed above, these measures are often crude and imprecise, and—in the case of environmental factors—highly contested. Furthermore, by definition, there are no straightforward measures of organizational efficiency.

(5)

We therefore seek to move beyond the piecemeal modeling of individual performance indicators and explicitly model covariance between indicators, without placing impossible demands on measurement instruments or modeling methodology. We believe that simultaneous modeling of performance measures is potentially important because:

it economizes on the need for detailed modeling of individual performance measures;

it economizes on the need to measure factors that affect performance across all performance measures, such as environmental factors;

it can reduce the very large conﬁdence intervals observed in single equation models, caused in part because of omitted or poorly measured explanatory variables; and

the more sensitive modeling of interactions may lead to different inferences about the level of an organization’s performance on speciﬁc indicators.

In short, the deployment of a more integrated model of multiple performance indicators can secure marked reductions in standard errors, and accordingly more secure performance rankings, without recourse to additional data or the highly questionable aggregation of performance indicators implicit in traditional productivity models (Stone 2002).

STATISTICAL METHODOLOGY

Scrutinized in their crude form, measures of public service performance rarely transmit meaningful messages about individual organizational attainment. Instead, some form of adjustment is often needed to account for the relative difﬁculty of the environment in which the organization operates. For example, school attainment is now routinely measured by the ‘‘value added’’ to a cohort of pupils rather than by crude examination results, thereby implicitly adjusting for the difﬁculty of the challenge confronting the school (Goldstein 1995). In health care, a highly developed ‘‘risk adjustment’’ industry has developed to make adjustments for variations in the health status of patients being treated by a doctor or hospital (Iezzoni 1997).

Such methods have undoubtedly led to major advances in the measurement of comparative performance. However, they are heavily reliant on good quality data, and they are in general most successful when considering in isolation a single operational objective (such as educational attainment at age sixteen, or health outcomes after coronary artery bypass surgery) where the production process is reasonably well understood. They are less useful when scrutinizing more strategic organizational objectives.

When the objective is less well understood, a typical approach to scrutinizing performance measures is to develop a regression model in which some measure of performance is the dependent variable, and various resource and environmental factors are entered as potential explanatory variables. The residuals from this model (unexplained variation) are then used as an indicator of the relative performance of the public service organizations. The simplest approach (corrected ordinary least squares regression) estimates the statistical model using ordinary least squares methods, and the entire residual is attributed to inefﬁciency. In contrast, methods such as stochastic frontier analysis partition the residual into an inefﬁciency component and a random element. However, the principles of a single regression remain unaltered (Kumbhaker and Lovell 2003).

(6)

Where—as is usually the case—a public service organization has multiple objectives, a series of independent regressions may be undertaken for a range of performance measures. The explanatory variables used in each of these regressions will usually vary, depending on what are considered to be the relevant causes of performance. For example, for schools two important measures of attainment might be examination results at age sixteen and the truancy rate. The determinants of these might in part be common, schoolwide factors (level of ﬁnancing, socioeconomic conditions), and in part factors speciﬁc to the performance measure (say, prior examination attainment in the case of exam results, or proximity to fast food outlets for truancy).

As discussed in the previous section, the fundamental insight of this article is that in many circumstances such individual regressions, or more precisely the error terms from each regression, will be linked. For example, there might be some unobservable or poorly measured variable, such as inefﬁciency, that has been omitted from the regressor set.

The essence of our approach is to model such covariances by incorporating a latent variable, which can be thought of as an implicit, unmeasured ‘‘organizational’’ effect on performance across all indicators. It can be defined as any influence on overall organi-zational performance, whether or not it is within the direct control of the organization. Each of the five factors discussed above might contribute to the organizational effect, which arises from a jumble of influences on measured performance. Of course, if we are able to introduce carefully justified covariates in the equations for specific indicators, this might reduce the observed covariance between performance measures. However, many determinants of organizational performance are not measured, or poorly measured, and so in general we cannot be sure that all the relevant covariates are available.

If the error terms across the individual performance indicator regressions are linked, the use of OLS to estimate separately each regression is inefﬁcient because it fails to utilize the information present in the cross-regression error correlations. In other words, the OLS estimator no longer offers the most efﬁcient estimates of the standard errors, although OLS remains a consistent estimator. To avoid this loss of information, the seemingly unrelated regression (SUR) estimator can be employed (Zellner 1962). This is a method of estimating systems of regressions in which the parameters for all equations are determined in a single procedure (Greene 2000).

To improve the precision of the parameter estimates, SUR estimation transforms the errors so that they all have the same variance and are uncorrelated. This transformation is then applied to the other variables in each equation, and OLS is applied to these transformed variables. This procedure—formally known as joint generalized least squares estimation—offers more precise parameter estimates than single equation least squares because it incorporates the additional information provided by the correlation between the individual equation errors.

We have shown elsewhere that SUR methods can dramatically alter parameter estimates, especially when the different equations use different explanatory variables (Martin et al. 2003). As a consequence, it is also likely that inferences about relative performance of organizations might be altered substantially when the method is deployed. Some results combining SUR methods with multilevel modeling have been reported for English health authorities (Hauck and Street 2003), and Bailey and Hewson (2004) use generalized estimation methods to model very speciﬁc data generation processes associated with trafﬁc safety indicators. However there has hitherto been surprisingly little use made of generalized regression methods in the analysis of public service

(7)

performance. Yet, there is obvious prima facie relevance of methods to estimate systems of equations with correlated disturbance terms when analyzing organizations that produce diverse multiple outputs.

AN EXAMPLE FROM HEALTH CARE

It is widely acknowledged that health care performance is multidimensional in nature, implying that performance measurement will require a number of different instruments to capture each dimension of attainment (Smith 2002a). In the United Kingdom—as in most other Organization for Economic Co-operation and Development (OECD) countries— health care is considered a public service, and performance indicators are now widely used to compare performance across different health care agencies and to encourage those with relatively poor results to improve their performance. For example, public acute hospitals are annually awarded star ratings based on their performance against about twenty-ﬁve indicators (Healthcare Commission 2004).

To illustrate the principles involved with SUR estimation, we employ a data set for 135 acute hospitals within the English National Health Service (NHS). These hospitals are publicly owned and have governing boards appointed by the national government. However, to some extent they operate in a pseudo-market, in the sense that they must effectively compete for business from the public purchasers of health care (known as primary care trusts). The hospitals are subject to a stringent performance management regime that seeks to ensure that national objectives, such as reducing waiting times for care, expenditure control, and patient safety, are adhered to (Smith 2002b).

We attempt to model those factors that are associated with three important aspects of performance, measured by the following indicators:

a measure of clinical quality (the readmission rate, deﬁned as the proportion of people discharged from the hospital who are subsequently readmitted as emergencies in connection with the same episode of care);

a measure of inpatient access to health care (the mean waiting time in days for admission for nonemergency surgery); and

a measure of hospital efﬁciency (the average length of stay).

We have available a wide range of potential explanatory factors, as summarized in table 1, each of which is predicted to affect the capacity of the hospital to pursue the objectives captured by the three performance measures. These data have been assembled by researchers at the Centre for Health Economics (Jacobs 2001), and they have been used in a series of studies of hospital behavior (Martin and Smith 2003). The variables have been divided into five broad groups: measures of supply volume, quality indicators, demand shifters, case mix indicators, and other supply shifters. There are three measures of supply volume (two for outpatients and one for inpatients) and eleven quality indicators. There are five demand shifters, and three of these variables are based on measures of competing resources (family practitioner availability, a Herfindahl index of hospital competition, and the local availability of private hospital beds). There are six indicators of surgical complexity (or resource intensity) and three further variables that affect a hospital’s supply capability.

(8)

Table 1

Variables in Hospital Database

Variable name Variable description

(a) volume of supply

GP_REFERRALS The number of general practitioner

referrals seen per head of catchment population, quarterly, averaged over the year, 1995–2001

ALL_REFERRALS The number of referrals seen (all

sources) per head of catchment population, quarterly, averaged over the year, 1995–2001

ADMISSIONS The number of waiting list

admissions per head of catchment population, quarterly, averaged over the year, 1995–2001 (b) quality measures

MORTALITY Brian Jarman’s mortality index based on

5 years’ data

PI_STARS Number of stars (Trust performance

rating, 2000 and 2001)

SICK_RATE Amount of time lost through

absence as a proportion of staff time available for directly employed NHS staff, 1999–2001

AHP_VACANCY Allied health professionals vacancy

rate (three-month vacancies expressed as a percentage of three-month medical workforce census vacancies plus staff in post), 1999–2001

NURSE_VACANCY Qualiﬁed nursing, midwifery, and health

visitors vacancy rate (three-month vacancies expressed as a percentage of three-month medical workforce census vacancies plus staff in post), 1999–2001

CONS_VACANCY Consultants vacancy rate

(three-month vacancies expressed as a percentage of three-month vacancies plus staff in post), 1999–2001

DELAY_DIS_PC Percentage of patients whose

discharge from hospital was delayed, 2001

READMISSION_RATE Number of emergency readmissions

that occur within 28 days of discharge per 10,000 admissions, 1995–1998

DEATH_NE_SURGERY Rate of death in hospital within 30

days of surgery, nonemergency admissions, 1995–1998

Continued

(9)

Table 1(continued)

Variables in Hospital Database

Variable name Variable description

DEATH_E_SURGERY Rate of death in hospital within 30

days of surgery, emergency admissions, 1995–1998

DAYCASES_PC Number of day cases divided by

number of elective admissions, 1995–2000

(c) demand shifters

NEED Need for health care of the

hospital’s catchment population (the so-called York index; see Smith et al. 1994)

GP_AVAILABILITY Number of whole time equivalent

general practitioners per head of population, 1999

HERFINDAHL15 Index of local NHS competition,

which equals 1 if there are no other acute providers within a 15 mile radius, 1995–96 data

PRIVATE_BEDS Availability of local private beds

relative to NHS beds, 1999

WAITING_TIME Mean waiting time for inpatient

admission, quarterly, averaged over the year, 1995–2001 (d) case mix indicators

LENGTH_OF_STAY Average length of stay, 1994–2000

(occupied bed days divided by number of admissions)

TRANSFERS_IN Proportion of spells that involve

a transfer in from another hospital, 1994–1997

TRANSFERS_OUT Proportion of spells that involve

a transfer out to another hospital, 1994–1997

PROP_ADMISS_60þ Proportion of patient admissions

over age 60, 1994–2000

HRG_INDEX Index of case mix complexity,

1994–1997

RESEARCH_SPEND Proportion of total revenue spent

on research, 1994–1997 (e) other supply shifters

POP_PER_BED Population per bed (both weighted

for distance); a measure of local supply and/or competition, 1999

OCCUPANCY_RATE Bed occupancy rate, 1996–2000

(occupied beds divided by available beds)

BEDS_PER_HEAD Average daily number of available

beds divided by population served by Trust, 1994–2000

(10)

The regression models reported below are based mainly on data relating to the ﬁnancial year 1999–2000, although where relevant data were not available we have sought to preserve the integrity of the sample by using data for adjacent years. This process enables us to include many more variables in the analysis and to estimate models over a much larger sample than would otherwise have been possible, and it is unlikely to affect results materially.

In order to identify which factors are associated with performance, each of the three indicators is first regressed against the potential explanatory variables in the database using OLS methods. Because of the large number of potential regressors, a stepwise procedure was employed to identify those variables that are significant at the 5 percent level. We acknowledge that such heuristic methods are not ideal, and future work would with benefit examine model-building principles within the context of SUR. Indeed, there is a risk that misspecification of explanatory variables may contribute unwittingly to the error covariance structure. However, in this article the focus is mainly on the integration of the regressions secured through SUR, rather than equation specification. The results of applying the stepwise procedure to each of the three performance indicators outlined above are shown in equations 1, 3, and 5 of table 2.

Four factors signiﬁcantly affect the emergency readmission rate. There are two indicators of case mix (the need for health care and the case mix complexity, as measured by the health care resource group [HRG] index), indicating that hospitals that admit more severe cases will tend to have a higher readmission rate. In addition, hospitals facing more competition from other public providers (as measured by Herﬁndahl index of competition) tend to have lower readmission rates, and competition from the private sector appears to be associated with higher public readmission rates.

With regard to the waiting time for elective admission, short waits are associated with high-need areas and the presence of higher levels of family practitioners per head of population (perhaps because such physicians perform minor surgery, thus reducing the demand facing local hospitals). There is also evidence that longer waits are associated with high nurse vacancy rates. Hospitals that find it difficult to recruit and retain nurses are likely to suffer operational difficulties for a number of reasons, and they may be spending relatively large sums on expensive agency nursing staff, both of which adversely affect the supply of health care.

The stepwise procedure identified nine significant factors that are associated with the average length of stay in the hospital. The most highly statistically significant is the number of beds per head of catchment population, suggesting that hospitals with more beds tend to have longer lengths of stay (one reason for this might be that there is less pressure on resources in such hospitals). There is also a positive association between the bed occupancy rate and length of stay, perhaps with long stays contributing to high occupancy rates. Case mix complexity has the anticipated impact on length of stay with positive coefficients on both the HRG index and the proportion of admissions who are over sixty years old. The positive coefficient on the proportion of revenue spent on research might also reflect the impact of case mix on length of stay. The negative coefficient on the proportion of admissions who are transferred from another hospital might reflect increased efficiency at those hospitals that accept more complex cases and/or the availability more appropriate facilities for more complex cases. The negative coefficient on the need for health care might reflect relatively better provision of postdischarge (residential) care facilities in more disadvantaged areas, and the negative coefficient on the Herfindahl index

(11)

Dependent variable Readmission rate Waiting time Length of stay

Estimation technique OLS SUR OLS SUR OLS SUR

Equation 1 2 3 4 5 6 AHP_Vacancy_Rate 0.0355(0.0127) 0.0349(0.0119) Nurse_Vacancy_Rate 0.0430(0.0195) 0.0446(0.0192) Readmission_Rate Need_for_Health_Care 1.3496(0.2197) 1.2975(0.2139) 0.4949(0.2208) 0.4923(0.2174) 0.7672(0.1469) 0.7630(0.1398) GP_Availability 0.5031(0.2896) 0.5095(0.2845) Herﬁndahl15 0.0443(0.0187) 0.0399(0.0182) 0.0621(0.0128) 0.0621(0.0122) Private_Beds 0.0474(0.0285) 0.0379(0.0275) Transfers_In 0.0773(0.0208) 0.0714(0.0195) Prop_Admiss_60þ 0.2291(0.0614) 0.2285(0.0576) HRG_Index 0.3456(0.1226) 0.3560(0.1202) 0.6883(0.1130) 0.6731(0.1074) Research_Spend 0.0144(0.0060) 0.0146(0.0056) Occupancy_Rate 0.6681(0.1471) 0.6992(0.1380) Beds_per_Head 0.5168(0.0511) 0.5261(0.0479)

AdjustedR2 0.331 n/a 0.083 n/a 0.736 n/a

Ramsey RESET

test:F5 1.07 n/a 1.07 n/a 1.31 n/a

Note:Estimated standard errors are in parentheses.

(12)

implies that as competition from other public providers increases, so the length of stay increases.

Ramsey’s reset test reveals no evidence of misspeciﬁcation in any of the three OLS equations. Yet, notwithstanding the extensive suite of variables described in table 1, many of the variables available are only poorly measured, and we do not have measures of some potentially important inﬂuences on measured performance, such as the local demand for emergency treatment.

To illustrate this issue, table 3 reports the correlation matrix for the residuals from the three estimated OLS equations. Note that the three sets of residuals are all positively correlated. Although a test of the independence of the three sets of residuals cannot reject the null hypothesis of independence, a test of the independence of the residuals from the readmission rate and length of stay equations alone leads to the rejection of the null at the 5 percent level (x2(1)55.766,p50.0163). This implies that these two sets of residuals are signiﬁcantly positively correlated and that there is some unobserved factor that boosts both the readmission rate and length of stay but which has not been included in the model. As noted above, this effect might arise from a jumble of inﬂuences.

We therefore re-estimated all three regressions using the SUR estimator, which utilizes the information present in the cross-regression error correlations, and the results from this re-estimation are presented as equations 2, 4, and 6 in table 2. Although there are changes to most of the parameter (coefﬁcient and standard error) estimates, these changes are in this example modest, in part reﬂecting the relatively low correlations between the OLS error terms. However, in another study we have found correlations of the order of 0.64 between the errors of separately estimated OLS equations of the demand for and supply of elective surgery. The SUR estimation has a correspondingly larger impact on parameter estimates and standard errors, leading to important changes in policy inferences. For example, the elasticity of demand for routine surgery with respect to waiting time changes from0.189 to0.035, with potentially important implications for policymakers who are seeking to reduce waiting times (Martin et al. 2003).

To understand why SUR estimation can affect such estimates, it is useful to recall how the SUR estimator works. It estimates each regression by OLS and uses the residuals to estimate the error variances both for each equation and across equations. The errors are then transformed so that they all have the same variance and are uncorrelated. The other variables are then subjected to the same transformation, and OLS estimation is applied to these transformed variables. The SUR estimator therefore ‘‘purges’’ the errors of their cross-equation correlation, and the transformation that achieves this is also applied to the other variables in the model. If the unobservable (omitted) factor driving the correlated errors is also correlated with other variables in the model, then it is to be expected that the purging transformation will also affect the estimated coefﬁcient on these other variables.

Table 3

Correlation Matrix of OLS Residuals

Readmission_rate Waiting_time Length_of_stay

Readmission rate 1.0000

Waiting_time 0.0212 1.0000

Length_of_stay 0.2067 0.0918 1.0000

Note:Breusch-Pagan test of independence of residuals: chi2(3)56.964,p50.0731.

(13)

For example, SUR estimation reduces the coefficient on the need for health care variable (a measure of social disadvantage) from 1.3496 to 1.2975 in equations 1 and 2 in table 2. One plausible interpretation of this result might be that need and overall inefficiency levels are positively correlated (hospitals that serve disadvantaged areas have lower levels of efficiency). Therefore, when the SUR estimator replaces OLS, the SUR transformation purges the need variable of its correlation with inefficiency, and the resulting SUR coefficient reflects a pure need effect on readmissions, rather than a combined need and inefficiency effect.

DISCUSSION

In this article we have sought to highlight the need to move beyond piecemeal modeling of measures of the performance of public service organizations. Hitherto, research effort in this domain has concentrated on constructing estimates of overall organizational efﬁciency, using the methods of productivity analysis. Though the construction of such composite measures of performance might in principle offer a useful tool for regulators, the results are often highly sensitive to technical choices, and they have rarely been deployed by regulators in earnest.

Moreover, we believe that from a managerial perspective it is likely to be more useful to focus on individual performance measures in order to identify where the greatest scope for improvement lies. In this article we argue that the analysis of individual performance indicators is often hampered by poor understanding of the exogenous factors that affect performance and poor measurement of those factors. The methods outlined here represent a promising technology for reducing the importance of this problem by exploiting information that arises from a series of regression models of organizational performance. The method in effect assumes a latent organization-wide variable that is unmeasured, but which to a greater or lesser extent affects all performance measures. It is captured by the covariance matrix of residuals from the individual equations, and it is likely to reflect factors such as unmeasured environmental influences, unmeasured resource character-istics, overall organizational efficiency, and data imperfections.

The SUR estimation results in reduced standard errors and therefore more secure models of the impact of external factors on performance. Although in the example used here the effect of deploying SUR methods is modest, this might reflect the extensive data set we had available. Other applications have resulted in large changes in estimated coefficients (Martin et al. 2003). Indeed SUR methods are likely to yield the greatest gains where ‘‘general purpose’’ organizations (such as local governments) are seeking to deliver a heterogeneous set of services to diverse user groups. In these circumstances, SUR methods may allow us to introduce radically different covariates for different dimensions of performance. We can thereby adjust for factors that are specific to those individual dimensions of performance, without needing to measure explicitly all the organization-wide factors that affect all dimensions of performance.

The use of SUR methods may also lead to changed inferences about the performance of individual organizations on speciﬁc measures, as indicated by the residual from the SUR equation. The SUR residuals represent unexplained deviations in performance from expected levels, after adjusting for covariates and purging the regression of organization-wide effects. This adjustment process therefore means that the SUR residuals adjust for the

(14)

jumble of five unmeasured influences discussed in our introductory section: resource levels, environmental factors, efficiency, substitution, and data quality.

While it is almost certainly helpful to seek to model most of these effects, it is worth noting that—if there is an overall organizational efficiency effect that influences attainment on all the performance measures—the SUR residuals will not capture this effect. In other words, the residuals indicate relative attainment on the performance measure after allowing for variations in overall organizational efficiency. The method may therefore be most helpful in enabling organizations to target the areas of performance that are the most urgent priorities for improvement rather than helping regulators develop overall measures of organizational attainment.

We do not claim that SUR methods offer a panacea for the problem of analyzing public service performance. Rather, they offer another promising tool for gaining insights into the determinants of performance and identifying the level of attainment of individual organizations, to be set alongside other areas of statistical inquiry, such as multilevel modeling, panel data methods, and productivity models. None of these approaches can alone answer the questions posed by politicians, regulators, managers, service users, and the general public. However, used carefully in conjunction, they offer great potential for enhancing our understanding of public services.

REFERENCES

Bailey, T., and P. Hewson. 2004. Simultaneous modelling of public service performance indicators. Journal of the Royal Statistical Society, Series A: Statistics in Society167 (3): 501–17. Bird, S. 2004. Performance monitoring in the public services.Journal of the Royal Statistical Society,

Series A: Statistics in Society,167 (3): 381–83.

Coelli, T., D. Rao, and G. Battese. 1998.An introduction to efﬁciency and productivity analysis. Boston: Kluwer.

Gattouﬁ, S., M. Oral, and A. Reisman. 2004. Data envelopment analysis literature: A bibliography update (1951–2001).Socio-Economic Planning Sciences38:159–229.

Goldstein, H. 1995.Multilevel statistical models.London: Edward Arnold.

Goldstein, H., and D. J. Spiegelhalter. 1996. League tables and their limitations: Statistical issues in comparisons of institutional performance.Journal of the Royal Statistical Society, Series A: Statistics in Society159:385–409.

Greene, W. H. 2000.Econometric analysis.Upper Saddle River, NJ: Prentice-Hall.

Hauck, K., and A. Street. 2003.A multivariate analysis of health care performance.York, UK: Centre for Health Economics, University of York.

Healthcare Commission. 2004.NHS performance ratings 2003/2004.London: Healthcare Commission. Heinrich, C., and L. Lynn. 2001. Means and ends: A comparative study of empirical methods for

investigating governance and performance. Journal of Public Administration Research and Theory 11 (1): 109–38.

Hollingsworth, B. 2003. Non-parametric and parametric applications measuring efﬁciency in health care. Health Care Management Science6 (4): 203–16.

Iezzoni, L. 1997.Risk adjustment for measuring healthcare outcomes.2d ed. Chicago: Health Administration Press.

Jacobs, R. 2001. Alternative methods to measure hospital efﬁciency: Data envelopment analysis and stochastic frontier analysis.Health Care Management Science4 (2): 103–15.

Kumbhaker, S., and C. Lovell. 2003.Stochastic frontier analysis.Cambridge, UK: Cambridge University Press.

Martin, S., R. Jacobs, N. Rice, and P. Smith. 2003.Waiting times for elective surgery: A hospital-based approach.York, UK: Centre for Health Economics, University of York.

(15)

Martin, S., and P. C. Smith. 2003. Using panel methods to model waiting times for National Health Service surgery.Journal of the Royal Statistical Society, Series A: Statistics in Society166: 369–87.

Smith, P. 1990. The use of performance indicators in the public sector.Journal of the Royal Statistical Society, Series A: Statistics in Society153:53–72.

———. 2002a. Measuring up: Improving health systems performance in OECD countries. Paris: Organisation for Economic Co-operation and Development.

———. 2002b. Performance management in British health care: Will it deliver?Health Affairs21 (3): 103–15.

Smith, P., T. A. Sheldon, R. A. Carr-Hill, S. Martin, S. Peacock, and G. Hardman. 1994. Allocating resources to health authorities: Results and policy implications of small area analysis of use of inpatient services.British Medical Journal309:1050–4.

Smith, P., and A. Street. Forthcoming. Measuring the efﬁciency of public services: The limits of analysis.Journal of the Royal Statistical Society, Series A: Statistics in Society.

Stone, M. 2002. How not to measure the efﬁciency of public services (and how one might).Journal of the Royal Statistical Society, Series A: Statistics in Society165 (3): 405–22.

Thanassoulis, E. 2001.Introduction to the theory and application of data envelopment analysis. Boston: Kluwer.

Zellner, A. 1962. An efﬁcient method of estimating seemingly unrelated regressions and tests for aggregation bias.Journal of the American Statistical Society57 (298): 348–68.