Data collection to support supply / demand forecasting and economic evaluation
4.2 Hydrological data are used to develop a water availability forecasting model
4.2.1 Method
The graph below (Figure 4.4) shows monthly volumes and inflows for Iskar reservoir between 1967 and 2000. The graph reveals how the reservoir acts as an equaliser by compensating for the large fluctuations in monthly inflow. The delayed response of reservoir levels to low water availability (inflow) over a number of years is clearly visible. The average delayed response of volumes to inflows is approximately 18 months although this is influenced by release volumes which are stipulated by the Ministry of Environment and Waters, but may be adapted depending in the conditions (flood or drought).
Figure 4.4. Iskar dam inflow and volume for the periods 1966 to 2000.
Understanding of the response time of the reservoir and the conditions leading up to the 1994-1995 water crisis informed the time step for the forecasting model. For demonstration purposes three components of the water balance - current reservoir level, average monthly reservoir volumes over the previous 12 months, and average monthly inflow volumes over the previous 12 months - were used as water availability indicators. Additional environmental indicators might be added to the model, such as average winter snow cover, if the modelling approach were to be adopted for decision support in the Upper Iskar.
To calculate the conditional probabilities the data for each node was assigned a single column in a spreadsheet. By off-setting the data column containing the 18 month reservoir volume forecast 18 months ahead of the columns containing the data for the three indicator nodes, the Hugin software was able to compute the conditional probabilities for forecasts based on all the water balance data from 1966 to 1999.
4.2.2 Results
Parameter sensitivity analysis can be used to identify variables in a probabilistic network whose change in state has a large or small impact on the probability distribution of a hypothesis variable. Sensitivity analysis was performed using a
0
1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Year
Iskar reservoir volumes Inflow Iskar reservoir
Data source: Bulgarian Academy of Sciences, 2007
diagnostic model developed from the forecasting data, presented in Appendix G, and revealed that out of five indicators analysed (total supply, release volumes, current inflow, current reservoir volumes, average monthly inflow over previous 12 months and average monthly reservoir volumes) three indicators could explain 80% of the variance in reservoir volumes forecast. The structure and resting state conditional probabilities of the forecasting model using the three indicators are shown in Figure 4.5, below.
Figure 4.5. Iskar dam forecasting sub-model
The number of parent variables and parent states in a model, referred to by Jensen (2001) as the parent space, influences the significance of findings in a Bn model. For example, if there are five parent variables each with three states we already have a parent space of 35. A section of the conditional probability and experience table tables for the node 18 month forecasted reservoir volume shown in Figure 4.6 (below) demonstrates how the number of states in a model influences the significance of findings.
Figure 4.6. Conditional probability table for the node Forecast (18 months) Iskar reservoir showing experience counts in the bottom rows
A problem that is immediately apparent in Figure 4.6 is the zero scores in some of the columns in the row labelled ‘experience’. The count in the experience table shows how many observations have been made so far. So a zero score might indicate that a particular combination of states is rare or extremely unlikely.
Alternatively zero scores may be due to a limited sampling period that does not cover all scenarios. It is conceivable that changing the data range of a state will alter the experience counts in that and other states Also, reducing the number of states in a Bayesian network can potentially results in loss of detail, what Jensen (2001) refers to as second-order uncertainty, as opposed to first-order uncertainty, which refers to the significance of data dependencies expressed as experience counts for each possible instantiation in the model. To demonstrate second-order uncertainty, observe the two conditional probabilities tables in Figure 4.7, below.
Current month Inflow (12 month average)
Volume (12 month average) High Low High Low High Low High Low High Low High Low High Low High Low
High 0.41 0.32 0.16 0.32 0.34 0.55 0.37 0.29 0.09 0.01 0.01 0.25 0.25 0.25 0.25 0.48
Medium 0.57 0.14 0.68 0.35 0.34 0.11 0.21 0.01 0.31 0.01 0.01 0.25 0.25 0.25 0.25 0.44
Low 0.01 0.45 0.15 0.12 0.31 0.27 0.21 0.41 0.57 0.97 0.97 0.25 0.25 0.25 0.25 0.01
Critical 0.01 0.09 0.01 0.21 0.01 0.07 0.21 0.29 0.03 0.01 0.01 0.25 0.25 0.25 0.25 0.07
Current month Inflow (12 month average)
Volume (12 month average) High Low High Low High Low High Low
High 0.83 0.56 0.71 0.485 0.45 0.26 0.26 0.71
Low 0.17 0.44 0.29 0.515 0.55 0.74 0.74 0.29
High Low High Low
High Low High Low
High Low
High Low High Low
High Medium Low Critical
Figure 4.7. Two conditional probability tables developed from the same forecasting data, but with a different number of states for the current and forecast reservoir volumes.
When computing the conditional probabilities and data dependencies in a Bayesian network from data using the NPC algorithm, the experience counts, and therefore the significance of model outputs, decreases as the number of node states is increased.
The result is actually an increase in first-order uncertainty. When using structural learning to construct models from large data sets, therefore, the model developer aims to achieve a balance these two types of uncertainty and work within the limits of the available data by choosing the most efficient discretization intervals.
Alternatively if knowledge elicitation is used, where experts or the model developer inserts conditional probabilities manually, parameter sensitivity analysis is a useful tool for identifying and focussing data collection resources on the variables that are most influential on the posterior probability of a hypothesis given evidence. That is, parameter sensitivity analysis can be used in an attempt to focus knowledge elicitation resources in the model construction process. In this case experience tables can be added to the cpt and be filled in by the expert to represent their confidence in their beliefs, thus allowing some measure of first-order uncertainty to be included in the model. Furthermore, Hugin also provides belief updating during structural learning, where conditional probabilities and experiences counts computed using the NPC algorithm can be updated using expert knowledge during structural learning.
This approach offers a possible solution where model constructed from historical data result in low or zero experience counts.
The following section describes data collection and model development for the first decision made by the Ministry of Energy (MoE), the water regulator, regarding the domestic customer water pricing strategy.