Chapter 4. Results for Anglian Water Case Study
4.5 Results from the Regression Model Development
The dependent variable, SFR (Single-family Residential) water use represents the observation averages for every household for each case study. Monthly observations over 2012-2015 for 63 households for the AWS case study were averaged for each household. It should be stressed that a ‘household’ is defined by the water meter that is associated to a single family’s billing unit.
An illustration of the daily PCC over the examined period for the participating households in the first case study is displayed in Figure 4.27. It can be observed that single family median water use (white dot) increases during the summer months. Violin plots illustrating the density of water demand show the stability of winter months and larger elasticity for July and August.A Violin Plot is used to visualise the distribution of the data and its probability density. This chart is a combination of a Box Plot and a Density Plot that is rotated and placed on each side, to show the distribution shape of the data. The thick black bar in the centre represents the interquartile range, the thin black line extended from it represents the 95% confidence intervals, and the white dot is the median.
Figure 4.27 Seasonality of daily per capita water demand by month (2012-2015) for the participating h/hs in AWS Case study (n=63)
120
Water consumption data for the participating households were accompanied with information on several aspects such as demographics, number and kind of water fixtures in the house and frequencies of use of water consuming appliances, as mentioned earlier. These data were exported from answers to a survey which was conducted the day the water efficiency devices were installed in each property. Table 4.1 contains the data that were provided for each household via their questionnaire answers and were used in the regression analysis as independent variables.
After the exclusion of flats and outliers cleaning, the dataset was reduced to 63 properties. Daily per capita water consumption (PCC) in its log-transformed form was used as the dependent variable in all linear regression models.
Many researchers use stepwise regression in their research. However, there is a good reason not to choose this method and that is because stepwise methods rely on the computer selecting variables based upon mathematical criteria. This takes important methodological decisions out of the hands of the researcher. There is also a danger of ‘over fitting’ the model or ‘under fitting’ it. Thus, hierarchical regression was preferred for this research. In hierarchical (blockwise entry) regression independent variables are chosen based on past work and the researcher decides in which order to enter the predictors in the model. After several iterations, the model that best describes the relationship between per capita water consumption and its predictors is discussed below. The model includes ACORN class, Number of people in the household, number of water fixtures in the house and number of water butts as independent variables.
4.5.1 Model Summary
For the model, R squared is .206, which means that the independent variables explain 20.6% of the variation in Log (per capita daily water consumption). The F-ratio in Table 4.7 represents the ratio of the improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model. F-ratio for the model is 5.275, p<0.05 which means that the model improved our ability to predict the outcome variable compared to not fitting the model. VIF (Variance Inflation Factor-see Table A18) values in the Model Summary table are all well below 10 and the Tolerance
121
statistics are well above 0.2; therefore, we can safely conclude that there is no collinearity within our data. The average VIF value is: (1.023+1.045+1.058)/3=1.042, which is very close to 1 and this confirms that collinearity is not a problem.
Table 4.7 Output of the linear model
Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics Durbin- Watson R Square Change F Change df1 df2 Sig. F Change 1 .454a .206 .167 .317 .206 5.275 3 61 .003 2.517
a. Predictors: (Constant), number of water butts, number of residents, Acorn class b. Dependent Variable: log per capita consumption
Model
Unstandardized Coefficients
t Sig.
95.0% Confidence
Interval for B Correlations
Collinearity Statistics B Std. Error Lower Bound Upper Bound Zero-
order Partial Part Tolerance VIF 1 (Constant) 5.230 .137 38.085 .000 4.955 5.504 number of residents -.093 .032 -2.882 .005 -.157 -.028 -.284 -.346 -.329 .977 1.023 Acorn class -.087 .033 -2.664 .010 -.152 -.022 -.244 -.323 -.304 .957 1.045 number of waterbutts -.065 .031 -2.094 .040 -.128 -.003 -.145 -.259 -.239 .945 1.058 4.5.2 Model Parameters
The specific model is defined as:
• Number of water butts (b=-.062)1
This value indicates that as the number of water butts increases, water consumption decreases. The significance of this variable is a rare finding in the literature. One would suppose that if a household has many water butts then it would have a big garden to water, hence a positive relationship with water consumption would be expected. However, the negative significant effect of the number of water butts in this analysis
1 Since the dependent variable, per capita daily water consumption, has been Log-transformed, the coefficients (b) of independent variables that are not Log-transformed can be calculated as: [exp(b)-1].
122
might imply that if a house possesses water butts then the household members may be well informed of the water issues and may show a more conservative behaviour as regards their water use.
• Acorn category (b=-0.083)
This variable is a proxy for social status, income and property value. As we move from ACORN class 1 to ACORN class 5, income, social status and the value of the property decrease. Hence, this value informs that as ACORN class changes from 1 to 5 consumption per capita decreases. That can be explained if someone thinks that ACORN class 1 is mainly represented by spacious, affluent households. Members of the first ACORN classes might have more water intensive possessions in their homes, big gardens or swimming pools and there may be more guests and entertainment. On the other hand, the last ACORN classes (4 & 5) are represented by smaller and poorer homes.
• Number of people in the household (b=-0.089)
This value shows that the more people living in a house, the less the consumption per person. This is a finding well documented in the literature.
4.5.3 Variables that Were Not Significant in Predicting Water Consumption and Were Excluded from the Model
• Tenancy type (-)
This variable was highly insignificant for all models that were explored. The negative sign of its regression coefficient points out that tenants tend to use less water than those who own their homes.
• Shower frequency (+)
The positive sign denotes that households with members that shower 8 or more times per week consume more water than households with members that shower 7 or less times per week.
• Dishwasher ownership (+)
The positive sign denotes that households with dishwashers consume more water than households without ones. The significance of this variable was marginally over .05.
123 • Dishwasher loads per week (+)
The positive sign denotes that the more dishwasher loads per week, the more water is consumed per person. This variable was highly insignificant.
• Age of washing machine (-)
The negative sign of the coefficient for this variable shows that houses with older washing machines consume more water per person than houses with newer ones. This variable was highly insignificant.
• Car washing (-)
Households that own cars and wash them at their premises tend to have larger per capita consumption than households that do not.
4.5.4 Model Diagnostics
After fitting a regression model it is crucial to determine whether all assumptions related to regression have been met. Any violations of these assumption can cast doubt on the validity of the conclusions drawn.
The assumptions of linear regression are:
1. Linearity-The relationship between the outcome variable and the explanatory variables is linear.
2. Homoscedasticity-The variance of the data points about the line of means is the same for each explanatory variable.
3. Independence-The explanatory variables are independent of each other. 4. Normality-The individual data points of the outcome variable for each of the
124
Figure 4.28 Predicted values vs. standardized residuals plot
Model evaluation relied to a great degree on graphical measures as they reveal more about the distribution of the data and the weaknesses of the model. Presence of heteroscedasticity in the data can be assessed through various tests and data visualizations, the majority of which involve inspection of the residuals (Field, 2013). Figure 4.28 provides an example of a scatterplot of the Anglian Water dataset, in which standardized predicted values are plotted against standardized residuals. The graph does not show any sign of heteroscedasticity for the sample since it does not ‘fan out’ in a funnel shape. Checking for independence of residuals, Durbin-Watson test confirms that there is no significant problem of correlations between errors since the value for this test is close to 2. Neither datasets presented problems with non-normality after per capita consumption was log-transformed.