• No results found

7.1

Overview

7.01 In this Chapter, we apply the described methodology to the Belgian gas DSO data 2008-2010 as to determine three critical decisions;

1) What is the necessary dimension of a model to explain average cost in the sector? 2) Which are the variable specifications that provide the highest information contents? 3) Which environmental and/or structural variables should be validated a second time for the

actual DEA runs?

7.2

Data

Controllable costs

7.02 Following the discussion above, we consider as input the Total costs with the following deductions:

1) Taxes, fees and direct charges to public authorities (excluding fines) 2) Costs for public service obligations (PSO: protected clients, etc.) 3) Tranfers from earlier years

The resulting cost is denoted TOTEX (or dTotex.std below).

7.03 To estimate certain decompositions, we also include the invested capital, expressed as the average annual regulatory asset base (RAB), here denoted Capital (Cap.ave), in other decompositions Opex (Opex.std) may occur.

Operators

7.04 The dataset includes data for 17 gas distribution operators in an unbalanced panel 2008- 2010. Behind these are five operators (Eandis, Ores, Infrax, Tecteo and BNO). An overview of the organizational data is given in Table 7-1 below.

Table 7-1 DSO for GAS, Belgium 2010 (Scope: 1= Gas only, 2 = Gas and electricity).

DSO Energy Region Regime Operator Scope

GASELWEST GAS F IM Eandis 2

IDEG GAS W IM Ores 2

IGAO-IMEA GAS F IM Eandis 2

IGH GAS W IM Ores 1

IMEWO GAS F IM Eandis 2

InfraxWest GAS F IP Infrax 2

InterEnerga GAS F IP Infrax 2

INTERGEM GAS F IM Eandis 2

INTERLUX GAS W IM Ores 2

IVEG GAS F IP Infrax 2

IVEKA GAS F IM Eandis 2

IVERLEK GAS F IM Eandis 2

RESA GAS W IP TECTEO 2

SEDILEC GAS W IM Ores 2

SIBELGA GAS B IM BNO 2

SIBELGAS GAS F IM Eandis 2

SIMOGEL GAS W IM Ores 2

7.3

Variables

7.05 The variable collection includes the input above Totex as well as a set of output parameters, including the following groups (called Y-variables)

1) Connections (total and per pressure level) 2) Pipeline length (total and per pressure level)

3) Energy transported (total, transit, per pressure level) 4) Compressor stations (#)

7.06 In addition to the given groups, tests were made with constructed variables that may be informative, such as weighted total assets and density. The results for these tests are given below.

7.4

OLS stage – model size

7.07 The initial phase in the model specification investigates the complexity of the cost function in terms of how many variables are necessary to capture the variance in average costs. In this exercise, the optimal path is found between model misspecification (too many variables leading to imprecise estimates and erroneous signs and significance of the chosen terms, multicollinearity) and model bias (too few variables chosen, the estimate is skewed in some

7.08 The Mallow’s Cp test determines through a sequential search through the groups of Y variables that are eligible for the cost function the statistically optimal specification for each size of model. As seen in Figure 7-1 below, the optimal size is normally found at the first intersection with the red line (where the number of parameters is equal to the Cp metric). This result indicates that the fit of the data is high and a model with 3-4 variables should be sufficient from an average cost viewpoint.

Figure 7-1 Model size (#parameters) vs model fit (Cp), gas.

7.09 The model size determination comes from a tradeoff between the number of parameters (the risk for specification errors) and the model fit (the risk for model bias). As seen in Figure 7-2, the reciprocal of the Malllows’ Cp is the adjusted R2 for the model. We see clearly

that the explanatory value increases rapidly up until three variables at a very high level of 97.5%. Starting from 4 parameters and upwards, the addition of parameters in fact decreases model fit, lowers prediction precision and reduces the possibility to study the signs and significance of any given parameter.

Figure 7-2 Model fit (adjusted R2) as function of model size (#parameters), gas.

7.5

Recursive regression – optimal model

Reference: Model gas 2005

7.10 For reference and comparison, we show the results for the previous model from 2005 for gas. As noted, it is severely overparametrized for the technology, with only two variables significant. The model obviously is associated with multicollinearity and lacks precision.

Residuals:

Min 1Q Median 3Q Max -10395652 -1905674 174262 2111820 10377940 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -2.187e+06 1.172e+06 -1.866 0.06894 . yConnections.lp 1.459e+02 4.205e+01 3.471 0.00119 ** yPipelines.lp 1.066e+03 6.456e+02 1.651 0.10597 yPipelines.mp 2.617e+03 3.918e+03 0.668 0.50785 yEnergy.transp.lp 1.282e-03 2.177e-03 0.589 0.55915 yStations.tot 1.271e+04 4.556e+03 2.789 0.00785 ** yConnections.mp 2.717e+02 2.950e+02 0.921 0.36218 yEnergy.transp.mp -5.078e-04 7.363e-04 -0.690 0.49406 ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4297000 on 43 degrees of freedom Multiple R-squared: 0.978,Adjusted R-squared: 0.9745 F-statistic: 273.7 on 7 and 43 DF, p-value: < 2.2e-16

Maximum explanatory model

7.11 The initial results yield a model with low-pressure outputs and the total number of stations. The model is powerful from a statistical viewpoint, but needs a correction for completeness in a cost-function estimation.

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -2.368e+06 1.102e+06 -2.149 0.036786 * yStations.tot 1.554e+04 1.960e+03 7.927 3.28e-10 *** yConnections.lp 1.716e+02 6.943e+00 24.710 < 2e-16 *** yPipelines.lp 1.294e+03 3.079e+02 4.203 0.000117 *** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4195000 on 47 degrees of freedom Multiple R-squared: 0.9771,Adjusted R-squared: 0.9757 F-statistic: 669.5 on 3 and 47 DF, p-value: < 2.2e-16

(Intercept) condition index = 1 Good, multicollinearity not a problem

yConnections.lp condition index = 3.660543 Good, multicollinearity not a problem yPipelines.lp condition index = 4.061041 Good, multicollinearity not a problem yStations.tot condition index = 5.135854 Good, multicollinearity not a problem

7.12 The model has a very low level of collinearity, as seen from the variance inflation factors.

Variance inflation factors

yStations.tot yConnections.lp yPipelines.lp 1.398406 2.032462 1.768117

Variant 1: Adjusted pipeline length

7.13 The resulting model from the econometric stage cannot be directly used, as the final model for regulatory application must comply with the criteria earlier stated. In particular, the cost function in a regulatory efficiency model must be complete and monotonous in the dimensions chosen. E.g., the selection of only low-pressure capacity and service provision (two last parameters above) may be statistically correct for the sample, but cannot be used in a forward looking and adaptive scenario when firms may expand in medium-pressure services in different proportions as the current. An omission of this dimension would be incomplete and lead to a dynamically inconsistent result. For this reason, we include in the model the optimal combination of the connections across pressure level as well as a measure for low-pressure equivalent pipeline length, including the medium pressure assets.

7.14 The development of the weighted pipeline length indicator is aimed at maximizing the likelihood of the observation, i.e. the fit with the data in the overall model. The determination of the parameter is shown in Figure 7-3 below. A model is tested containing a single input which is the weighted sum of MP and LP pipeline length. On the horizontal axis we have the scaling parameter between MP and LP pipelines. The information value (measured as the Akaike Information Criterion, AIC) is calculated for each model, the lowest value is recorded. The AIC should be minimized. The blue line in the Figure is then the difference between the AIC for a given scaling constant and the minimal value. For the given data, the optimal coefficient was determined to 4.090909, which means that 1 km of medium pressure pipeline has a four times higher Totex impact than 1 km low-pressure pipeline. This scaling parameter can be revalidated whenever technical progress is observed or regularly before dynamic runs.

Figure 7-3 Weighting coefficient MP/LP pipelines, gas data 2008-2010.

7.15 The resulting model now has a new output: total adjusted pipeline length. As seen below, the results are further improved in that the weighted variable combines additional information. All variables have right sign and are highly significant, the adjusted R2 is 97.6%. When including the new variable in the set of eligible Y-variables, the model below is returned as the optimal explanatory model and thus validated.

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -2.091e+06 1.081e+06 -1.934 0.0592 . zConnections.tot 1.687e+02 6.828e+00 24.707 < 2e-16 *** zPipelines.adj.tot 8.869e+02 1.996e+02 4.443 5.37e-05 *** yStations.tot 1.207e+04 2.190e+03 5.514 1.45e-06 *** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4157000 on 47 degrees of freedom Multiple R-squared: 0.9775,Adjusted R-squared: 0.9761 F-statistic: 682.1 on 3 and 47 DF, p-value: < 2.2e-16

7.16 The final model has no multi-collinearity as seen from the VIF records.

Variance inflation factors

zConnections.tot zPipelines.adj.tot yStations.tot 2.035902 2.597437 1.777491

Variant 2: weighted connection points

7.17 A weighted aggregation across pressure levels, similar to that for pipelines, is not useful for connections, as shown in the regression below. The MP connections do not add explanatory power to the model and any weighting different from zero will dilute the value of the model. It is therefore decided to remain with the total number of connections.

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -2.083e+06 1.092e+06 -1.907 0.0627 . yConnections.lp 1.675e+02 8.147e+00 20.564 < 2e-16 *** yConnections.mp 2.463e+02 2.866e+02 0.860 0.3945 zPipelines.adj.tot 9.030e+02 2.102e+02 4.297 8.89e-05 *** yStations.tot 1.211e+04 2.217e+03 5.465 1.82e-06 *** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4198000 on 46 degrees of freedom Multiple R-squared: 0.9776,Adjusted R-squared: 0.9756 F-statistic: 501.5 on 4 and 46 DF, p-value: < 2.2e-16

7.6

Outlier analysis

7.18 The retained model from above is tested for possible outliers that may distort the results and the specification. In line with Agrell and Bogetoft (2007), we use primarily Cook’s distance for this analysis. The analysis is documented in below. In terms of Cook’s distance, no outlier is detected. However, for completeness, we repeat the robust regression to validate the results above. The results are given below, indicating that indeed the parameters are valid, significant and that the model is sound. This does not contradict a normal outlier detection in the non-parametric phase.

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -1.628e+06 6.361e+05 -2.559 0.013782 * zConnections.tot 1.721e+02 6.666e+00 25.814 < 2e-16 *** zPipelines.adj.tot 9.030e+02 2.260e+02 3.996 0.000226 *** yStations.tot 9.567e+03 3.504e+03 2.730 0.008875 ** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Robust residual standard error: 2593000

Figure 7-4 Regression diagnostics best OLS model: Cook's distance, no outlier detected (outside red limits 0.5). Gas data 2008-2010.

7.7

Validation of Z-variables

7.19 Once chosen, we now test the residuals for the given model against some Z-variables to determine whether they are likely candidates for DEA second-stage analysis or bias correction.

Density

7.20 Density, as measured in a ratio of pipelines per connection (or the inverse) has no impact on the residuals, nor on the residuals normalized by total expenditure (relative cost difference).

lm(formula = bmlm$residuals/dTotex.std ~ I(yPipelines.lp/yConnections.lp)[dsoin]) Residuals:

Min 1Q Median 3Q Max -0.37368 -0.09874 -0.01810 0.04846 0.73726 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 0.04287 0.06834 0.627 0.533 I(yPipelines.lp/yConnections.lp)[dsoin] -1.04981 3.17687 -0.330 0.742 Residual standard error: 0.2229 on 49 degrees of freedom

Multiple R-squared: 0.002224,Adjusted R-squared: -0.01814 F-statistic: 0.1092 on 1 and 49 DF, p-value: 0.7425

Age impact

7.21 The impact of age (either opex effects or capex effects due to bookvalue depreciations) is investigated through a standard second-stage analysis, where a set of age variables are tested with respect to the residuals for the best OLS model above. As seen, the result for gas is not significant.

Call:

lm(formula = bmlm$residuals ~ zAge.creg[dsoin]) Residuals:

Min 1Q Median 3Q Max -8963455 -1539840 -305108 999163 10626876 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 6070724 4330079 1.402 0.170 zAge.creg[dsoin] -323774 250557 -1.292 0.205 Residual standard error: 3827000 on 34 degrees of freedom (15 observations deleted due to missingness)

Multiple R-squared: 0.04681,Adjusted R-squared: 0.01878 F-statistic: 1.67 on 1 and 34 DF, p-value: 0.205

7.22 However, the average age of the assets is significant for explaining average cost residuals relative to the Totex. The sign is the expected, which indicates that asset age should be validated in the second-stage process for the non-parametric runs as to avoid bias to older assets in the capex.

Call:

lm(formula = bmlm$residuals/dTotex.std ~ zAge.creg[dsoin]) Residuals:

Min 1Q Median 3Q Max -0.47365 -0.08882 0.00489 0.07319 0.35306 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 1.22597 0.18688 6.560 1.63e-07 *** zAge.creg[dsoin] -0.06985 0.01081 -6.459 2.20e-07 *** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1652 on 34 degrees of freedom (15 observations deleted due to missingness)

Multiple R-squared: 0.551,Adjusted R-squared: 0.5378 F-statistic: 41.72 on 1 and 34 DF, p-value: 2.196e-07

Economies of scale

7.23 The potential scale effects are not validated here, as the linearity of the model is confirmed for statistical purposes. The scale assumption is important in the efficiency estimation where it is validated formally.

Economies of scope

7.24 The economies of scope cannot be tested as only a single DSO operates a gas-only concession.

7.8

Model specification results

7.25 We conclude having found a minimal average cost function in Table 7-2with correct signs, significant parameters, devoid of multicollinearity and consistent with the criteria for regulatory benchmarking. The model is shown to be robust also to outlier analysis, independent of density, scope and scale effects. The age effects are weakly indicated and will be verified during the second-stage analysis of the DEA runs.

Table 7-2 Parameters in the final gas model.

Parameter Category Definition

xTotex.std Input Total controllable expenditure (EUR) yPipelines.adj.tot Service provision Total weighted pipeline length (km) yStations.tot Capacity provision Total number of pressure stations (#) yConnections.tot Service provision Total number of MP+LP EAN connections

Related documents