3.2 Time Series Analysis
3.2.2 Autocorrelation Analysis
The previous analysis shows a lack of weekly and daily seasonality over the demand time series, therefore, this section aims to find if there is any special demand behaviour which are not a weekly or daily patterns. To find any correlation or patterns between the time series points, the Partial Autocorrelation Function (PACF) was calculated over 500-time lags, this is shown in in Figure 3-10. The PACF can help to find intraday and longer correlations which repeatedly occur. The PACF plot, Figure 3-10, shows the relationship betweenthe RTG crane demand time series at DL (n) for up to 500 half hourly lags. The PACF can be defined as follows:
71
PACF (i) = corr (( DL (n), DL (n − i)| DL (n − 1), . . . . , DL (n − i − 1))). (3-8)
In general, the PACF analysis, Equation 3-8, helps to find the correlation, corr, between the two direct variables relation without taking into account the effect of all lags in between [107]. The PACF plot shows the correlation relationship between the time series and the lagged time series, at say lag i, after removing the time effects lags (1,2,..., i-1) between them. The PACF coefficient is considered as significant if it is larger than a particular magnitude. The confidence interval line for PACF plot in Figure 3-10 has critical threshold of ±1.96/√n where n is the number of observations [107], hence more observations reduce the size of critical value. The values of the partial autocorrelation sequence outside the ± 0.088 confidence boundary occur at lags 1 to 5, which have values under 0.25 except for the first lag which is above 0.5. The PACF plot shows a cut off after lag number 5 and it shows other significant PACF lags between lag 450 and lag 500. The distribution of the significant lags in the PACF plot does not show a clear pattern or seasonalities in compared other LV demand which typically shows significant lags at 48, multiply. However, the significant lags between lag 450 and lag 500, as shown in Figure 3-10 were distributed with no large main spike that decreased after a few lags or was followed by a damped wave which can present a moving average term. These lags are likely random salience, difficult to understand, and they are likely an artifact due to the small size of the time series. Furthermore, the early correlation lags could be due to the crane driver tasks that take more than a single time step to complete it. Hence the distribution of these significant lags in the PACF plot do not show a clear auto correlation behaviour for the RTG crane demand time series. To investigate the small lags between the DL (n) and DL (n − i) for i = {1, 2, 3, 4} in more details, a linear regression model is considered. The regression examines the linear relationship between the variables in order to find a line of best fit. In this section, the R-squared (R2) is calculated to find how well the linear model fits the lagged crane demand to the current demand, as described by Equation (3-9).
D̂L (n) = a + b DL(n − i); i = {1,2,3,4}. (3-9) Table 3-3 summarises the equation’s parameters (a, b ∈ ℝ) and R-squared value for the relationship between the D̂L (n) and the demand time series with lag i (DL(n − i)). The calculation results show that the highest R2 value is 0.16. In other words, the linear model explains only 16% of the load variability. The R2 decreased gradually from 0.16 to 0.04 in line with the increase in the i value for the crane demand from the previous time step. Due to the
72
low R2 values, the linear model based on the correlation between the current and historical crane demand is not an effective estimate to forecast the crane’s demand behaviour. This shows that there are other factors that must be considered to understand the true data volatility as will be discussed in Section 3.3. In addition, the RTG crane demand time series analysis in this section with lack of large correlations at lag 48 (daily patterns) or 336 (weekly patterns) support the conclusion that there is no clear sign of any daily or weekly patterns over the time series.
Table 3-3: The linear equation parameters and R- squared for the relationship between the current and lagged crane demand.
Correlated variables Linear model equations parameters 𝐑𝟐
a b
DL(n) vs DL(n − 1) 12.62 0.40 16.06%
DL(n) vs DL(n − 2) 14.96 0.29 8.37%
DL(n) vs DL(n − 3) 15.76 0.25 6.32%
DL(n) vs DL(n − 4) 16.54 0.21 4.59%
Figure 3-10: PACF plot for RTG crane demand time series for 500-time lags.
Overall, the collected data set for the electric RTG crane demand depicts random and volatile behaviour. The time series analysis shows that the crane demand does not have a clear half hourly, daily or weekly seasonality. This increases the difficulties of forecasting the crane demand compared to, say, LV demand. The non-smooth behaviour of RTG crane demand is mainly due to the effects of human and work environmental behaviour factors during the crane and port operation time. The work activity inside ports mainly depends on the volatile occurrence and movement of shipments [6] [20]. For example, a port may have many ships berthed at the same time and this requires increased crane activity. The term volatile or stochastic has been defined as the variables that change rapidly with low regularity, these terms
73
are used throughout the thesis to qualitatively define and describe the difficulty of predicting RTG crane demand. In the following sections, the analysis of the RTG crane demand time series patterns, demand characteristics and correlation with external variables are presented.