3 Modelling Issues
3.1 Reasons for including overseas data
At the outset it is important to note that the purpose of the modelling contained in Economic Insights (2014) was not to undertake international benchmarking. Rather, the objective was to supplement the observations available to improve the precision with which key parameters can be estimated. More precise parameter estimates increase confidence in model accuracy and allow more accurate accounting for output change in forecasts of future opex productivity growth. This objective has determined the modelling strategy adopted. Several of the DNSPs’
consultants’ reports have misinterpreted the purpose of the study and made comments regarding comparisons of results across countries which are incorrect and incompatible with the modelling approach adopted. Before addressing these comments, we first review the characteristics of the Australian DNSP RIN data and the results of models using only the Australian data.
3.1.1 Characteristics of the Australian DNSP RIN data
As noted in Economic Insights (2014, p.28), after a careful analysis of the economic benchmarking RIN data we concluded that there was insufficient variation in the DNSP RIN data set to allow us to reliably estimate even a simple version of an opex cost function model (eg a Cobb–Douglas LSE model with three output variables and two operating environment variables) using only the Australian data. A sample data plot is reproduced in figure 3.1 and provides an illustration of the limitations of the data from an econometric perspective. In essence, the time series pattern of the data is quite similar across the 13 DNSPs. Hence, in
this case, there is little additional data variation supplied by moving from a cross–sectional data set of 13 observations to a panel data set of 104 observations. As a consequence, while the Australian DNSP RIN data are perfectly capable of supporting index number–based economic benchmarking methods, more complex econometric models require a larger data set with more than what is essentially 13 different observations to allow estimation with a high degree of confidence. Unless the RIN data are supplemented with additional cross–
sectional observations, the ‘implicit’ degrees of freedom would be near zero or even negative in some cases.
The impact of this characteristic of the RIN data can be further illustrated by comparing the parameter estimates of a simple cost function using the full RIN database with 104 observations with the parameter estimates of a similar regression using only the 13 cross–
sectional observations obtained by averaging the 8 years of data for each DNSP. The estimates are presented in table 3.1.
Figure 3.1 Real Opex per Customer for Australian DNSPs, 2006–2013
The coefficients for the output and operating environment factor parameters can be seen to be very similar in table 3.1 indicating that little additional explanatory power is obtained from including additional time–series data. Hence, there are likely to be inadequate degrees of freedom in the Australian panel data to form robust parameter estimates. This could manifest itself as a lack of significance for the included output parameters or, alternatively, as spurious precision for additional operating environment factors if included as the limited variation is quickly (but inappropriately) explained (as is the case in CEPA 2015a,b) – see Halcoussis (2005, Cht13) for a discussion of the misleading results small samples are prone to producing.
Rather, additional cross–sectional observations are required to introduce more variation into the data and obtain more robust estimates.
Table 3.1 Opex cost function estimates using full RIN sample and DNSP averages
Parameter Full RIN sample RIN DNSP averages sample
Log(customer numbers) 0.4908 0.5089
Log(circuit length) 0.4118 0.4173
Log(ratcheted max demand) 0.0826 0.0527
Log(share of underground) 0.1818 0.1921
Log(time) 0.0227
Constant –33.4415 12.0370
R squared 0.8764 0.8900
No of observations 104 13
The difficulties of using only the RIN data for econometric studies is noted in two of the DNSPs’ consultants’ reports. PEGR (2015, p.37) observe:
‘In Australia, standardized data are available for only thirteen distributors for the 2006–2013 sample period. A sample with only 104 observations greatly limits the sophistication of econometric cost models that can be developed: it may be impossible to obtain accurate estimates of the cost impact of certain business conditions, or to properly model scale and scope economies.’
Based on using the RIN data only, CEPA (2015a, p.24) also observes:
‘I have been unable to estimate SFA models on a robust and consistent basis. In most cases the SFA models will not converge and results are not produced. In cases where the models do converge a small change to the specification (i.e.
additional variable included) would result in non–convergence, thus indicating the lack of robustness in the models.’
CEPA (2015b, p.22) also notes ‘I found that the modelling was very sensitive to the inclusion of operating environment variables’.
Furthermore, given the characteristics of the RIN data, it is simply not possible to compare model ‘slope’ coefficients across the three jurisdictions with any degree of confidence (as CEPA 2015a, p.ix, and FE 2015a, p.24, claim to do).
This is also the reason that FE (2015a, p.75) obtains opex efficiency scores of 100 per cent for 10 of the 13 Australian DNSPs in its data envelopment analysis (DEA) application – there is simply not enough cross–sectional observations to sensibly support this technique using only the RIN data. It is, therefore, rather perplexing that FE (2015a, p.101) goes on to recommend that the AER ‘discard the international data from its sample’, ‘rely only on Australian data’
and ‘rely on most recent evidence from 2013’. This again reflects FE’s apparent lack of familiarity with the Australian electricity distribution industry.
3.1.2 Australian DNSP RIN data econometric results
We initially examined the scope to estimate an opex cost function using only the AER’s economic benchmarking RIN data on the 13 Australian DNSPs over the available 8 year
period (104 observations in total). However, this produced econometric estimates that were relatively unstable. We tried both Cobb–Douglas and translog functional forms using both SFA and LSE methods and tried a range of different sets of regressor variables. We observed that small changes in variable sets (and methods and functional forms) could have a substantial effect on the output elasticity estimates obtained and, in some cases, on the subsequent efficiency measures derived from these models. Translog models were found to violate monotonicity conditions (the requirement that an increase in any output involves an increase in cost) for nearly all observations.
Table 3.2 SFA Cobb–Douglas opex frontier estimates using RIN dataset
Variable Coefficient Standard error t–ratio
ln(DistributionTrans) –0.018 0.120 –0.150
ln(Custnum) 0.559 0.332 1.680
ln(CircLen) 0.124 0.183 0.680
ln(ShareUGC) –0.162 0.164 –0.990
ln(ShareSingleStage) –0.421 0.117 –3.610
ln(CMoS): 0.043 0.061 0.700
Year 0.046 0.009 5.120
Constant –81.588 17.810 –4.58
Variance parameters:
Mu –0.585 2.322 –0.250
SigmaU squared 0.300 0.649
SigmaV squared 0.007 0.001
LLF 83.103
Figure 3.2 SFA Cobb–Douglas opex efficiency scores using RIN dataset
0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
PCR CIT END TND JEN AND UED SAP ENX ESS ERG ACT AGD
Opex CE Lower Bound Upper Bound
The most comprehensive opex cost function model we derived from using the RIN data only involved two outputs (customer numbers and circuit length), three operating environment factors (share of underground, share of single stage transformation and minutes off supply),
distribution transformer MVA as the quantity of capital (to allow for opex/capital interaction) and a time trend. Estimating a Cobb–Douglas opex cost function using the SFA method produced the estimates shown in table 3.2 and the efficiency scores and confidence intervals shown in figure 3.2.
From table 3.2 we see that the estimated coefficients are all of the expected sign but the significance levels of the parameter estimates are relatively low. The spread and ranking of opex efficiency scores in figure 3.2 are relatively similar to those reported in Economic Insights (2014) using the three country database. The one exception is Endeavour Energy which ranks third best in the RIN data only model compared to ninth best using the broader data set.
Table 3.3 SFA Cobb–Douglas cost frontier estimates using RIN data only and preferred 3–country specification
Variable Coefficient Standard error t–ratio
ln(Custnum) 1.106 0.230 4.810
ln(CircLen) 0.126 0.087 1.450
ln(RMDemand) –0.164 0.240 –0.690
ln(ShareUGC) –0.030 0.088 –0.340
Year 0.034 0.005 6.260
Constant –55.976 10.849 –5.160
Variance parameters:
Mu –0.283 1.402 –0.200
SigmaU squared 0.326 0.523
SigmaV squared 0.008 0.001
LLF 78.190
Figure 3.3 Preferred 3–output SFA Cobb–Douglas efficiency scores – RIN only data versus 3 country data
0.0 0.2 0.4 0.6 0.8 1.0
ACT AGD CIT END ENX ERG ESS JEN PCR SAP AND TND UED
RIN data only 3 country database
To further illustrate the impact of using Australian only data versus data from the three countries, in table 3.3 we present regression results for our preferred three–country CD SFA model using only the RIN data. It does not perform well in terms of the signs and magnitudes of the estimated coefficients as can be seen in table 3.3, but it does produce very similar efficiency scores as can be seen in figure 3.3.
In response to the question raised in PEGR (2015, pp.65–66) – and contrary to the assertions in FE (2015a, p.xi), CEPA (2015a, pp.16–18) and Huegin (2015c, pp.13,15) – the results in Economic Insights (2014) are thus consistent with those obtained using Australian RIN data only and are not ‘driven’ by the larger number of observations from the overseas jurisdictions.
Rather, the addition of the international data simply improves the precision of the parameter estimates, as intended, while providing only minor refinements to the efficiency scores.
Finally, it is important to recognise that the characteristics of the Australian RIN data make any econometric model estimated using only the RIN data insufficiently robust to support regulatory decisions. In particular, Huegin’s (2015a, p.13) claim that international data are introduced because ‘the imperative is to facilitate a specific model type’ (meaning SFA) is incorrect. Rather, more data are required to facilitate the robust estimation of any type of econometric model.