• No results found

Method: forecasting dependencies between metered use and demand variables

Significance threshold NB: Only links above this

6.4 Forecasting water demand and water savings in individual households

6.4.1 Method: forecasting dependencies between metered use and demand variables

Information for individual household demand variables were collected during the household survey interviews. Metered water demand data was then collected by Sofiyska Voda using personal information provided by interviewees in the social survey. Variables and links included in the probabilistic layer of the forecasting model, showing indoor and outdoor demand variables, are shown in Figure 6.12. A section of the spreadsheet that was used to compute conditional probabilities for the model, showing household survey responses, is also shown.

The water company was able to identify water company accounts for only 40 household out of a possible 200 interviewees who provided their names and addresses. This was a flaw in the research design, specifically in the expectation that it would be possible for the water company to cross-reference the household survey data with metered water demand data using the names and addresses of the interviewee. Future surveys would make use of water company account numbers which would be pre-selected into classes according pre-specified metered demand ranges.

INDOOR DEMAND VARIABLES OUTDOOR

DEMAND VARIABLES

Figure 6.12. Structure of the probabilistic layer of the forecasting model showing indoor and outdoor water demand variables.

The interviewee’s water company account number would need to be clearly visible on a household survey and sent to citizens by post by the water company. This would ensure that the water company could easily cross-reference returned household surveys with their household metered data, rather than relying on householder’s names and addresses which, it turned out, were unreliable criteria for accurately identifying water company accounts.

The quantity of metered data that could be linked to completed surveys was insufficient for developing the forecasting model and led to unacceptable (±30%) confidence intervals (p=<0.05). It was determined, however, that information on variables of household demand that had been collected by Sofiyska Voda could be used to develop a dataset that could then be used to perform structural learning to develop conditional probabilities for the household demand forecasting model.

A study into causes of variable household demand and potential water saving measures and their impacts (WDM Procedure 6 Report, Sofiyska Voda, 2004), completed by researchers at the University of Architecture, Civil Engineering and Geodesy (UACEG) in 2004 on behalf of Sofiyska Voda, as a condition of the EU-ISPA concession agreement provided the majority of information for developing the dataset. The findings in the report that were relevant to developing the household demand forecasting model are described in Appendix O. Based on these findings a dataset of metered demand for the 540 social survey responses was developed in a corresponding column in the spreadsheet containing the household survey data.

Using this data-set, structural learning was then used to derive conditional probabilities for the demand forecasting model described below.

6.4.2 Results

The Bn in Figure 6.13, below, shows the resting state conditional probabilities for the demand forecasting model based on the dataset. From a research perspective the resting state conditional probabilities are of interest because reveal how the data in the random sample is distributed between states. Furthermore, by instantiating the model the user can update the distribution for different grouping, for example, comparing

Figure 6.13. Bayesian network of metered household demand variable, with no evidence, showing only significant links. The dialogue box shows the relative strength of data dependencies.

occupancy distributions in household with and without water saving WCs or in different household groups. The dialogue box in Figure 6.13 shows the relative strength of data dependencies between metered water demand and indoor demand variables for the data-set.

As the number of demand variables is increased so the parent space for the hypothesis node also increases. As mentioned in Chapter 4 reducing the parent space (determined by the number of child nodes and their states) is desirable to conserve data collection resources. Once the parent space has been reduced (i.e.

through parameter sensitivity analysis) a sampling approach can be designed to achieve an equal number of households in each metered household demand range to achieve the required significant levels. As the number of variables (i.e. child nodes) increases, achieving the ideal sample becomes increasingly complex (Rossi et al., 1993).

The node labelled metered household demand in Figure 6.13 contains eight states this would imply a sample size of 480 equally distributed among the eight states would achieve required significance levels (i.e. a sample of 60 out of a 100,000 population will give 95% confidence of ± 9.2%). The distribution of household survey responses between different metered demand ranges, as shown in the monitor windows for the node ‘metered water demand’ in Figure 6.14, was not equal. This was partly due to the snow-ball sampling approach used. The resting state conditional probabilities show that only the first three states in the node labelled metered household demand contain sufficient data to achieve 95% confidence of ± 9.2% and further data collection would be required to achieve the required significance levels for all states in the model. Alternatively missing data can be provided using expert knowledge, and in this way Bns provide method for combining data to address sampling problems.

The above model was applied by informed practitioners during the end-user evaluation, described in Chapter 6, where they used the model to forecast demand and compared the results to a small data-set where actual metered data had been collected.