In this paper, we propose two novel EANN variants for TSF that are fully automatic and can be used by non specialized users to perform multi-step ahead **time** **series** forecasts, since no a priori knowledge is assumed from the analyzed **time** **series**. In contrast with the majority of EANN works for TSF, the proposed EANN variants make use of EDA as the search engine under two design strategies: **Sparsely** **connected** EANN (SEANN) and **Time** lag selection EANN (TEANN). Both strategies perform a simultaneous feature and model selection for TSF, although with a different emphasis. SEANN puts more effort in model selection by explicitly defining if a connection exists and **time** lag deletion only occurs when an input has no connections. TEANN enforces feature selection, explicitly defining which **time** lags are used in the chromosome, while ANN structure selection is made only in terms of number of input and hidden nodes. These strategies are addressed separately in or- der to measure the contribution of each other when compared with the fully **connected** EDA EANN [35]. Moreover, we also compare all EANN methods with the popular ARIMA methodology and three recently proposed machine learning methods: Random Forest (RF), Echo State Network (ESN) and Support Vector Machine (SVM). The experiments were performed using sev- eral real-world **time** **series** from distinct domains and the distinct **forecasting** approaches were compared under both **forecasting** and computational per- formance measurements. The paper is organized as follows. First, Section 2 described the EANN approaches. Next, in Section 3 we present the ex- perimental setup and analyze the obtained results. Finally, we conclude the paper in Section 4.

Show more
30 Read more

problem using a type of wavelet **neural** **networks**. The basic building block of the **neural** network models is a ridge type function. The training of such a net- work is a nonlinear **optimization** problem. **Evolutionary** algorithms (EAs), in- cluding genetic algorithm (GA) and particle swarm **optimization** (PSO), togeth- er with a new gradient-free algorithm (called coordinate dictionary search opti- mization – CDSO), are used to train network models. An example for real speed wind data modelling and prediction is provided to show the performance of the proposed **networks** trained by these three **optimization** algorithms.

Show more
13 Read more

GAs are machine learning procedures, which derive their behavior from the process of evolution in nature and are used to solve complicated **optimization** prob- lems ( Goldberg, 1989; Michalewicz, 1996 ). In nature, individuals that better ﬁt the environment have more probability of surviving and transferring their chromo- somes to their descendants, compared to individuals with poor ﬁtness that most probably will become ex- tinct. Following the same idea, GAs are iterative sto- chastic methodologies, that start with a random population of possible solutions (chromosomes). The ﬁt- ness of each chromosome is measured by computing the corresponding value of a carefully chosen ﬁtness func- tion. Then, a new generation is produced by giving more probabilities of surviving to the individuals with the best ﬁtness values. As the algorithm proceeds, the members of the population are gradually improved. This parallel searching mechanism is the main advantage of GAs, since they cannot easily get trapped in local minima. In order to better emulate the way nature behaves, some genetic ‘‘operators’’ are added to the algorithms, such as the mutation operator, where some members of each individual are altered randomly, and the crossover oper- ator, where new individuals are born from a random combination of the old ones.

Show more
In this paper, we proposed an **evolutionary** artificial **neural** network en- gine to evolve a fitness weighted n-fold cross-validation artificial **neural** net- work ensemble scheme for **time** **series** **forecasting**. To combine the n ANN outputs into a single response, we explored four distinct combination func- tions. Experiments held with six **time** **series**, with different characteristics and from different domains. As the main outcome of this work, we show that the fitness weighted n-fold ensemble improves the accuracy of the forecasts, outperforming both the no weight n−fold ensemble and the simpler holdout validation (0-fold) EANN. Also, as a compromise between accuracy and com- putational cost, based on the presented results, we advise the use of a 4-fold ANN ensemble that is evolved using weighted cross-validation and that uses a rank-based combination method to build the final forecasts. Moreover, when compared with a classical method like Holt-Winters, competitive **forecasting** results were achieved by the proposed approach, showing that it can be an interesting alternative. In future work, we intend to use the EANN engine to evolve ensembles of **sparsely** **connected** ANNs [10]. We also intend to apply a similar approach to evolve ensembles of other base learners, such as support vector machines [26].

Show more
15 Read more

Using hybrid model or combining several models has become a common practice to improve the **forecasting** accuracy since the well-known M-competition [23] in which combination of forecasts from more than one model often leads to improved forecast- ing performance. The literature on this topic has expanded dramatically since the early work of Reid [32] and Bates and Granger [1]. Clemen [6] provided a comprehensive review and annotated bibliography in this area. The basic idea of the model combina- tion in **forecasting** is to use eachmodel’s unique feature to capture di5erent patterns in the data. Both theoretical and empirical /ndings suggest that combining di5erent methods can be an e5ective and eHcient way to improve forecasts [22,28,29,40]. In **neural** network **forecasting** research, a number of combining schemes have been pro- posed. Wedding and Cios [39] described a combining methodology using radial basis function **networks** and the Box–Jenkins models. Luxhoj et al. [21] presented a hybrid econometric and ANN approachfor sales **forecasting**. Pelikan et al. [30] and Ginzburg and Horn [13] proposed to combine several feedforward **neural** **networks** to improve **time** **series** **forecasting** accuracy.

Show more
17 Read more

they learn to solve a certain problem. 2 The number of hidden layers, as well as the number of nodes in these layers, are one of the several hyperparamet- ers which have to be set and which usually depend on a particular problem. With increasing number of hidden nodes, artificial **neural** **networks** can tackle more complex data, but they are more difficult to train and more prone to overfitting. In this case, a **neural** network appropriately learns the data it is trained on but cannot generalise for previously unseen data, called the testing set. The output layer, **connected** to the last hidden layer, produces an output vector for each input vector. The number of nodes in the output layer equals to the length of the output vector.

Show more
81 Read more

While a plain **neural** network (Chapter 3), also known as a feedforward **neural** network, is the basis of all subsets of **neural** **networks** and is useful in many cases of machine learning, it fails in **forecasting** sequential data. This is because a plain **neural** network is aware only of the information that is present in that input of the network. Recurrent **neural** **networks** (RNNs) were invented to solve this problem and are used in a number of different machine learning tasks, such as predicting a word in a sentence, **time**-**series** **forecasting** and speech recognition – all being cases where plain feedforward **neural** **networks** cannot succeed as well. (Banerjee 2018.)

Show more
40 Read more

In this paper, experiments were realized on real world TS **forecasting** problem with the new non-standard QNN learning scheme. The results show that QNN performance was better than those obtained with ANN constructed in another work [Ferreira, 2006] for some measures and better than linear algebraic models ARIMA in overall, notably for DJIA **series**. This is encouraging, since stock market prediction is a difficult problem.

10 Read more

There exist many methods for pruning a network, see for example Fine (1999, Chapter 6) for an informative account. In this paper we apply a technique called “Bayesian regularization”, as described in MacKay (1992a). The aim of Bayesian regularization is twofold. First, it is intended to facilitate maximum likelihood estimation by penalizing the log-likelihood in case some of the parameter estimates become excessively large. Second, the method is used to find a parsimonious model within a possibly very large model. In order to describe the former aim in more detail, suppose that the estimation problem is “ill-posed” in the sense that the likelihood function is very flat in several directions of the parameter space. This is not uncommon in large **neural** network models, and it makes numerical maximization of the likelihood difficult. Besides, the maximum likelihood value may be strongly dependent on a small number of data points. An appropriate prior distribution on the parameters acts as a “regularizer” that imposes smoothness and makes estimation easier. For example, the prior distribution may be defined such that it shrinks the parameters or some linear combinations of them towards zero. Information in the **time** **series** is used to find the “optimal” amount of shrinkage. Furthermore, a set of smaller models nested within the large original model is defined. The algorithm allows to choose one of these sub-models and thus reduce the size of the **neural** network. This requires determining prior probabilities for the models in the set and finding the one with the highest posterior probability. Bayesian regularization can be applied to feedforward **neural** **networks** of type (3), as dis- cussed in MacKay (1992b). In this context, the set of eligible AR-NN models does not usually contain models with a linear unit, and we adhere to that practice here. In our case, the largest model has nine hidden units (q = 9 in (3)), and the maximum lag p equals six. We apply the Levenberg-Marquardt **optimization** algorithm in conjunction with Bayesian regularization as proposed in Foresee and Hagan (1997).

Show more
39 Read more

More noticeably, differences and discrepancies in the design of all CI-competitions become evident, which seriously impair their contribution. As a concession to the resources required to run a competition, both the **forecasting** and CI competitions each employed only one hold-out set, and hence a single **time** **series** origin. However, while all competitions in the **forecasting** domain have used representative sample sizes of hundreds or even thousands of **time** **series** in order to derive robust results, CI competitions have mostly evaluated accuracies on a single **time** **series** only. The few competitions which evaluated multiple **time** **series**, such as the Santa Fe and predictive uncertainty competitions, did so for distinct domains, with only one **series** per category, again limiting any generalisation of their findings. Had the same algorithm been used across multiple similar **series**, datasets or competitions, it would have allowed somewhat more reliable and insightful results to be obtained. Instead, the same authors applied different methodologies for each dataset, even within a given competition, thus leading to distinctly different models and preventing any comparisons. Also, none of the CI competitions compare the results with established benchmark methods, whether na¨ıve methods (i.e. a random walk), simple statistical benchmarks which are used in the application domain (e.g., ES methods), or non-statistical methods in the same family of algorithms (e.g., a simple NN with default parameters to compete against a more sophisticated architecture). We therefore conclude that the recommendations on the design of empirical evaluations developed in **forecasting** have been ignored by the CI community. Makridakis and Hibon’s ( 2000 ) original criticism holds: just like theoretical statisticians before them, NN researchers have concentrated their efforts on building more sophisticated models, with no regard to either the assessment of their accuracy or objective empirical verifications, successfully ignoring the strong empirical evidence of the M-competitions and the ground rules they have laid out on how to assess **forecasting** competitions. This substantially limits the

Show more
26 Read more

Abstrak Kajian ini membincangkan keupayaan pengkaedahan Box-Jenkins bila dibandingkan dengan kaedah Rangkaian **Neural** dalam peramalan siri masa. Lima siri masa yang kompleks dibangunkan menggunakan kaedah rambatan balik Rangkaian **Neural** dan dibandingkan dengan model Box-Jenkins yang piawai. Analisis kajian menunjukkan bahawa bagi data siri masa bermusim, kedua-dua kaedah menghasilkan keputusan yang setanding. Walau bagaimana pun, untuk siri masa yang berbentuk tidak menentu, kaedah Box-Jenkins menghasilkan keputusan yang kurang baik berbanding Rangkaian **Neural**. Hasil ini juga menunjukkan bahawa Rangkaian **Neural** adalah teguh, menghasilkan peramalan yang baik untuk jangka panjang, dan boleh menjadi kaedah alternatif untuk peramalan.

Show more
N. Kourentzes and S. F. Crone (2009). Modelling deterministic seasonality with artificial **neural** **networks** fo r tim e **series** **forecasting**. Working Paper. Lancaster, Lancaster University. N. Kourentzes and S. F. Crone (2009). **Forecasting** with **neural** **networks**: from low to high frequency tim e **series**. Working Paper. Lancaster, Lancaster University

239 Read more

Abstract The approaches of fuzzy **time** **series** are used commonly for the analysis of real life **time** **series** whose observations include uncertainty. Because of the fact that **forecasting** methods of fuzzy **time** **series** do not need many constraints in the approaches of classic **time** **series**, the interest towards this method is increasing. The **forecasting** methods of fuzzy **time** **series** in the literature focus on the models **connected** to the fuzzy autoregressive (AR) variables. In the models in which the methods of classic **time** **series** are used, there are not only autoregressive variables of **time** **series** but also moving average (MA) variables of **time** **series**. However in the **forecasting** method of fuzzy **time** **series** proposed in the literature, there are no using of MA variables except for only two studies. In this study, by defining a new first-order **forecasting** model of fuzzy **time** **series** which include not only fuzzy AR variables but also MA variables, an analysis of algorithm that depends on artificial **neural** **networks** is proposed. The new proposed method is applied to Istanbul Stock Exchange (IMKB) national 100 index **time** **series**, gold prices of the Central Bank of the Republic of Turkey and two simulated chaotic **time** **series** and compared with the other methods in the literature with regard to **forecasting** performance.

Show more
14 Read more

However, although the above analysis indicates a good accuracy in one-step ahead pre- diction using a six neuron NN, it is not clear that the obtained **neural** model can reproduce the dynamics of the Lorenz system. Figure 3 illustrates this fact by showing the evolution of two different NNs; in the first case, the **neural** system converges to a periodic trajectory (Fig. 3(a)), whereas in the second case it converges to a fixed point (Fig. 3(b)), neither of them resembling the chaotic behavior of the lorenz model. As we have seen in this example, an interesting result obtained when training NNs with a low number of parameters is that the resulting orbits may not behave as the original chaotic system, but resemble some unstable periodic orbits embedded in the chaotic system. This fact may be caused by the simpler dynamics associated with unstable periodic orbits, and will be the scope of a future paper (see [16] for an introduction to unstable periodic orbits and their role in the topology of chaotic attractors).

Show more
12 Read more

However, in testing it was noted that as is the case with mathematical expressions, practically NARMA network has a better approach to model NLMA (from the point of view of better **forecasting** capacity measures) than ARNN network. This indicates that this network can be a good candidate to nonlinear data model containing moving averages compo- nents, but requires to be studied in detail, and so a new research question arises: From the theoretical approach point of view, what are the considerations that the recurrent network NARMA (0, q) must have so it can predict properly nonlinear **time** **series** containing inherent moving average components? 7. Conclusion

Show more
10 Read more

First the **time** **series** figures are plotted on a graph. The points are joined by straight lines. We get fluctuating straight lines, through which an average straight line is drawn. This method is however, inaccurate, since different persons may fit different trend lines for the same set of data.

10 Read more

(Freund and Schapire, 1997) with Wagging (Bauer and Kohavi, 1999) (a variant of bagging (Breiman, 1996)). The idea is to use AdaBoost as the individual learning algorithm for Wagging. In a posterior work, Webb and Zheng (2004) claim that Multiboosting and other similar approaches provide a better trade- off between the error of the individual members of the ensemble and the di- versity among them. Yu et al. (2007) presented an approach for combining a number of regression ensembles, dubbed Cocktail Ensemble, using the ambi- guity decomposition. Wu et al. (2001) propose E-GASEN (Genetic Algorithm based Selective Ensemble method), a **neural** network ensemble. Essentially, this approach combines several GASEN ensembles using a simple average. A GASEN ensemble works by initially building a set of **neural** network models; afterwards, a genetic algorithm is run to prune the ensemble. In 2000, Pfahringer (2000) won the well-known KDD Cup competition with a combination of bagging and boosting (“bagged boosting”). The EasyEnsemble and BalanceCascade (Liu et al., 2009) are another two approaches that combine bagging with boosting, focusing on imbalanced learning problems.

Show more
239 Read more

ARMA models are only appropriate for stationary **time** **series**. By looking at Graph 7a it appears that our inflation **time** **series** is rather stationary. Nevertheless, the unit root test yields an ADF Test Statistic of minus 2.604243, which is larger than the critical values. This result suggests that our **time** **series** has to be differentiated. Graph 7c shows that the **time** **series** does not display any seasonality, hence seasonal adjustment is not necessary. Some experts would also inspect the **time** **series** plot looking for outliers (i.e. extreme values) in the data, that are either due to coding mistakes or due to extraordinary events (e.g. stock market crash, economic crises etc). They might then replace outliers by local averages.

Show more
46 Read more

16 Read more

In this paper, we focus on learning and **forecasting** the trends in **time** **series** via **neural** **networks**. This involves learn- ing different aspects of the data. On the one hand, the trend variation of **time** **series** is a sequence of historical trends car- rying the long-term contextual information of **time** **series** and naturally affects the evolution of the following trend. On the other hand, the recent raw data points of **time** **series** [Wang et al., 2011], which represent the local behaviour of **time** **series**, affects the evolving of the following trend as well and have particular predictive power for abruptly changing trends. For instance, in Figure 2(b), trend 1, 2 and 3 present a continu- ous upward pattern corresponding to the **time** **series** before the prediction **time** instant. Then when we aim at predicting the subsequent trend of **time** **series**, the previous three succes- sive upward trends outline a probable increasing trend after- wards. However, the local data points around the end of the third trend as is shown in Figure 2(a), e.g., data points in the red circle, indicate that **time** **series** could stabilize and even decrease. The true data after the third trend indeed present a decreasing trend indicated by the blue dotted segment. In this case, the subsequent trend has more dependency on the local data points. Therefore, it is highly desired to develop a systematic way to model such hidden and complementary dependencies in **time** **series**.

Show more