In this paper, we propose two novel EANN variants for TSF that are fully automatic and can be used by non specialized users to perform multi-step ahead timeseries forecasts, since no a priori knowledge is assumed from the analyzed timeseries. In contrast with the majority of EANN works for TSF, the proposed EANN variants make use of EDA as the search engine under two design strategies: Sparselyconnected EANN (SEANN) and Time lag selection EANN (TEANN). Both strategies perform a simultaneous feature and model selection for TSF, although with a different emphasis. SEANN puts more effort in model selection by explicitly defining if a connection exists and time lag deletion only occurs when an input has no connections. TEANN enforces feature selection, explicitly defining which time lags are used in the chromosome, while ANN structure selection is made only in terms of number of input and hidden nodes. These strategies are addressed separately in or- der to measure the contribution of each other when compared with the fully connected EDA EANN . Moreover, we also compare all EANN methods with the popular ARIMA methodology and three recently proposed machine learning methods: Random Forest (RF), Echo State Network (ESN) and Support Vector Machine (SVM). The experiments were performed using sev- eral real-world timeseries from distinct domains and the distinct forecasting approaches were compared under both forecasting and computational per- formance measurements. The paper is organized as follows. First, Section 2 described the EANN approaches. Next, in Section 3 we present the ex- perimental setup and analyze the obtained results. Finally, we conclude the paper in Section 4.
problem using a type of wavelet neuralnetworks. The basic building block of the neural network models is a ridge type function. The training of such a net- work is a nonlinear optimization problem. Evolutionary algorithms (EAs), in- cluding genetic algorithm (GA) and particle swarm optimization (PSO), togeth- er with a new gradient-free algorithm (called coordinate dictionary search opti- mization – CDSO), are used to train network models. An example for real speed wind data modelling and prediction is provided to show the performance of the proposed networks trained by these three optimization algorithms.
GAs are machine learning procedures, which derive their behavior from the process of evolution in nature and are used to solve complicated optimization prob- lems ( Goldberg, 1989; Michalewicz, 1996 ). In nature, individuals that better ﬁt the environment have more probability of surviving and transferring their chromo- somes to their descendants, compared to individuals with poor ﬁtness that most probably will become ex- tinct. Following the same idea, GAs are iterative sto- chastic methodologies, that start with a random population of possible solutions (chromosomes). The ﬁt- ness of each chromosome is measured by computing the corresponding value of a carefully chosen ﬁtness func- tion. Then, a new generation is produced by giving more probabilities of surviving to the individuals with the best ﬁtness values. As the algorithm proceeds, the members of the population are gradually improved. This parallel searching mechanism is the main advantage of GAs, since they cannot easily get trapped in local minima. In order to better emulate the way nature behaves, some genetic ‘‘operators’’ are added to the algorithms, such as the mutation operator, where some members of each individual are altered randomly, and the crossover oper- ator, where new individuals are born from a random combination of the old ones.
In this paper, we proposed an evolutionary artificial neural network en- gine to evolve a fitness weighted n-fold cross-validation artificial neural net- work ensemble scheme for timeseriesforecasting. To combine the n ANN outputs into a single response, we explored four distinct combination func- tions. Experiments held with six timeseries, with different characteristics and from different domains. As the main outcome of this work, we show that the fitness weighted n-fold ensemble improves the accuracy of the forecasts, outperforming both the no weight n−fold ensemble and the simpler holdout validation (0-fold) EANN. Also, as a compromise between accuracy and com- putational cost, based on the presented results, we advise the use of a 4-fold ANN ensemble that is evolved using weighted cross-validation and that uses a rank-based combination method to build the final forecasts. Moreover, when compared with a classical method like Holt-Winters, competitive forecasting results were achieved by the proposed approach, showing that it can be an interesting alternative. In future work, we intend to use the EANN engine to evolve ensembles of sparselyconnected ANNs . We also intend to apply a similar approach to evolve ensembles of other base learners, such as support vector machines .
Using hybrid model or combining several models has become a common practice to improve the forecasting accuracy since the well-known M-competition  in which combination of forecasts from more than one model often leads to improved forecast- ing performance. The literature on this topic has expanded dramatically since the early work of Reid  and Bates and Granger . Clemen  provided a comprehensive review and annotated bibliography in this area. The basic idea of the model combina- tion in forecasting is to use eachmodel’s unique feature to capture di5erent patterns in the data. Both theoretical and empirical /ndings suggest that combining di5erent methods can be an e5ective and eHcient way to improve forecasts [22,28,29,40]. In neural network forecasting research, a number of combining schemes have been pro- posed. Wedding and Cios  described a combining methodology using radial basis function networks and the Box–Jenkins models. Luxhoj et al.  presented a hybrid econometric and ANN approachfor sales forecasting. Pelikan et al.  and Ginzburg and Horn  proposed to combine several feedforward neuralnetworks to improve timeseriesforecasting accuracy.
they learn to solve a certain problem. 2 The number of hidden layers, as well as the number of nodes in these layers, are one of the several hyperparamet- ers which have to be set and which usually depend on a particular problem. With increasing number of hidden nodes, artificial neuralnetworks can tackle more complex data, but they are more difficult to train and more prone to overfitting. In this case, a neural network appropriately learns the data it is trained on but cannot generalise for previously unseen data, called the testing set. The output layer, connected to the last hidden layer, produces an output vector for each input vector. The number of nodes in the output layer equals to the length of the output vector.
While a plain neural network (Chapter 3), also known as a feedforward neural network, is the basis of all subsets of neuralnetworks and is useful in many cases of machine learning, it fails in forecasting sequential data. This is because a plain neural network is aware only of the information that is present in that input of the network. Recurrent neuralnetworks (RNNs) were invented to solve this problem and are used in a number of different machine learning tasks, such as predicting a word in a sentence, time-seriesforecasting and speech recognition – all being cases where plain feedforward neuralnetworks cannot succeed as well. (Banerjee 2018.)
In this paper, experiments were realized on real world TS forecasting problem with the new non-standard QNN learning scheme. The results show that QNN performance was better than those obtained with ANN constructed in another work [Ferreira, 2006] for some measures and better than linear algebraic models ARIMA in overall, notably for DJIA series. This is encouraging, since stock market prediction is a difficult problem.
There exist many methods for pruning a network, see for example Fine (1999, Chapter 6) for an informative account. In this paper we apply a technique called “Bayesian regularization”, as described in MacKay (1992a). The aim of Bayesian regularization is twofold. First, it is intended to facilitate maximum likelihood estimation by penalizing the log-likelihood in case some of the parameter estimates become excessively large. Second, the method is used to find a parsimonious model within a possibly very large model. In order to describe the former aim in more detail, suppose that the estimation problem is “ill-posed” in the sense that the likelihood function is very flat in several directions of the parameter space. This is not uncommon in large neural network models, and it makes numerical maximization of the likelihood difficult. Besides, the maximum likelihood value may be strongly dependent on a small number of data points. An appropriate prior distribution on the parameters acts as a “regularizer” that imposes smoothness and makes estimation easier. For example, the prior distribution may be defined such that it shrinks the parameters or some linear combinations of them towards zero. Information in the timeseries is used to find the “optimal” amount of shrinkage. Furthermore, a set of smaller models nested within the large original model is defined. The algorithm allows to choose one of these sub-models and thus reduce the size of the neural network. This requires determining prior probabilities for the models in the set and finding the one with the highest posterior probability. Bayesian regularization can be applied to feedforward neuralnetworks of type (3), as dis- cussed in MacKay (1992b). In this context, the set of eligible AR-NN models does not usually contain models with a linear unit, and we adhere to that practice here. In our case, the largest model has nine hidden units (q = 9 in (3)), and the maximum lag p equals six. We apply the Levenberg-Marquardt optimization algorithm in conjunction with Bayesian regularization as proposed in Foresee and Hagan (1997).
More noticeably, differences and discrepancies in the design of all CI-competitions become evident, which seriously impair their contribution. As a concession to the resources required to run a competition, both the forecasting and CI competitions each employed only one hold-out set, and hence a single timeseries origin. However, while all competitions in the forecasting domain have used representative sample sizes of hundreds or even thousands of timeseries in order to derive robust results, CI competitions have mostly evaluated accuracies on a single timeseries only. The few competitions which evaluated multiple timeseries, such as the Santa Fe and predictive uncertainty competitions, did so for distinct domains, with only one series per category, again limiting any generalisation of their findings. Had the same algorithm been used across multiple similar series, datasets or competitions, it would have allowed somewhat more reliable and insightful results to be obtained. Instead, the same authors applied different methodologies for each dataset, even within a given competition, thus leading to distinctly different models and preventing any comparisons. Also, none of the CI competitions compare the results with established benchmark methods, whether na¨ıve methods (i.e. a random walk), simple statistical benchmarks which are used in the application domain (e.g., ES methods), or non-statistical methods in the same family of algorithms (e.g., a simple NN with default parameters to compete against a more sophisticated architecture). We therefore conclude that the recommendations on the design of empirical evaluations developed in forecasting have been ignored by the CI community. Makridakis and Hibon’s ( 2000 ) original criticism holds: just like theoretical statisticians before them, NN researchers have concentrated their efforts on building more sophisticated models, with no regard to either the assessment of their accuracy or objective empirical verifications, successfully ignoring the strong empirical evidence of the M-competitions and the ground rules they have laid out on how to assess forecasting competitions. This substantially limits the
Abstrak Kajian ini membincangkan keupayaan pengkaedahan Box-Jenkins bila dibandingkan dengan kaedah Rangkaian Neural dalam peramalan siri masa. Lima siri masa yang kompleks dibangunkan menggunakan kaedah rambatan balik Rangkaian Neural dan dibandingkan dengan model Box-Jenkins yang piawai. Analisis kajian menunjukkan bahawa bagi data siri masa bermusim, kedua-dua kaedah menghasilkan keputusan yang setanding. Walau bagaimana pun, untuk siri masa yang berbentuk tidak menentu, kaedah Box-Jenkins menghasilkan keputusan yang kurang baik berbanding Rangkaian Neural. Hasil ini juga menunjukkan bahawa Rangkaian Neural adalah teguh, menghasilkan peramalan yang baik untuk jangka panjang, dan boleh menjadi kaedah alternatif untuk peramalan.
N. Kourentzes and S. F. Crone (2009). Modelling deterministic seasonality with artificial neuralnetworks fo r tim e seriesforecasting. Working Paper. Lancaster, Lancaster University. N. Kourentzes and S. F. Crone (2009). Forecasting with neuralnetworks: from low to high frequency tim e series. Working Paper. Lancaster, Lancaster University
Abstract The approaches of fuzzy timeseries are used commonly for the analysis of real life timeseries whose observations include uncertainty. Because of the fact that forecasting methods of fuzzy timeseries do not need many constraints in the approaches of classic timeseries, the interest towards this method is increasing. The forecasting methods of fuzzy timeseries in the literature focus on the models connected to the fuzzy autoregressive (AR) variables. In the models in which the methods of classic timeseries are used, there are not only autoregressive variables of timeseries but also moving average (MA) variables of timeseries. However in the forecasting method of fuzzy timeseries proposed in the literature, there are no using of MA variables except for only two studies. In this study, by defining a new first-order forecasting model of fuzzy timeseries which include not only fuzzy AR variables but also MA variables, an analysis of algorithm that depends on artificial neuralnetworks is proposed. The new proposed method is applied to Istanbul Stock Exchange (IMKB) national 100 index timeseries, gold prices of the Central Bank of the Republic of Turkey and two simulated chaotic timeseries and compared with the other methods in the literature with regard to forecasting performance.
However, although the above analysis indicates a good accuracy in one-step ahead pre- diction using a six neuron NN, it is not clear that the obtained neural model can reproduce the dynamics of the Lorenz system. Figure 3 illustrates this fact by showing the evolution of two different NNs; in the first case, the neural system converges to a periodic trajectory (Fig. 3(a)), whereas in the second case it converges to a fixed point (Fig. 3(b)), neither of them resembling the chaotic behavior of the lorenz model. As we have seen in this example, an interesting result obtained when training NNs with a low number of parameters is that the resulting orbits may not behave as the original chaotic system, but resemble some unstable periodic orbits embedded in the chaotic system. This fact may be caused by the simpler dynamics associated with unstable periodic orbits, and will be the scope of a future paper (see  for an introduction to unstable periodic orbits and their role in the topology of chaotic attractors).
However, in testing it was noted that as is the case with mathematical expressions, practically NARMA network has a better approach to model NLMA (from the point of view of better forecasting capacity measures) than ARNN network. This indicates that this network can be a good candidate to nonlinear data model containing moving averages compo- nents, but requires to be studied in detail, and so a new research question arises: From the theoretical approach point of view, what are the considerations that the recurrent network NARMA (0, q) must have so it can predict properly nonlinear timeseries containing inherent moving average components? 7. Conclusion
First the timeseries figures are plotted on a graph. The points are joined by straight lines. We get fluctuating straight lines, through which an average straight line is drawn. This method is however, inaccurate, since different persons may fit different trend lines for the same set of data.
(Freund and Schapire, 1997) with Wagging (Bauer and Kohavi, 1999) (a variant of bagging (Breiman, 1996)). The idea is to use AdaBoost as the individual learning algorithm for Wagging. In a posterior work, Webb and Zheng (2004) claim that Multiboosting and other similar approaches provide a better trade- off between the error of the individual members of the ensemble and the di- versity among them. Yu et al. (2007) presented an approach for combining a number of regression ensembles, dubbed Cocktail Ensemble, using the ambi- guity decomposition. Wu et al. (2001) propose E-GASEN (Genetic Algorithm based Selective Ensemble method), a neural network ensemble. Essentially, this approach combines several GASEN ensembles using a simple average. A GASEN ensemble works by initially building a set of neural network models; afterwards, a genetic algorithm is run to prune the ensemble. In 2000, Pfahringer (2000) won the well-known KDD Cup competition with a combination of bagging and boosting (“bagged boosting”). The EasyEnsemble and BalanceCascade (Liu et al., 2009) are another two approaches that combine bagging with boosting, focusing on imbalanced learning problems.
ARMA models are only appropriate for stationary timeseries. By looking at Graph 7a it appears that our inflation timeseries is rather stationary. Nevertheless, the unit root test yields an ADF Test Statistic of minus 2.604243, which is larger than the critical values. This result suggests that our timeseries has to be differentiated. Graph 7c shows that the timeseries does not display any seasonality, hence seasonal adjustment is not necessary. Some experts would also inspect the timeseries plot looking for outliers (i.e. extreme values) in the data, that are either due to coding mistakes or due to extraordinary events (e.g. stock market crash, economic crises etc). They might then replace outliers by local averages.
Forecasting the future based on past data is a key issue to support decision mak- ing in a wide range domains, including scientific, industrial, commercial and eco- nomic activity areas. In this paper, we address multi-step ahead TimeSeries Forecast- ing (TSF), which is useful to support tactical decisions, such as planning resources, stocks and budgets. As the base learner, we adopt the modern Support Vector Machine (SVM), which often achieves high quality predictive results and presents theoretical advantages (e.g., optimum learning convergence) over other learners, such as Artifi- cial NeuralNetworks (ANN). To automate the search of the best SVM forecasting model, we use a recently proposed evolutionary algorithm: Estimation Distribution Algorithm (EDA). This search method is used to perform a simultaneous variable (number of inputs) and model (hyperparameters) selection. Using EDA, we propose two Evolutionary SVM (ESVM) variants for TSF, under global (GESVM) and de- composition (DESVM) approaches. The former uses all past patterns to fit the SVM, while the latter decomposes first the original series into trended and stationary com- ponents, then uses ESVM to predict each individual component and finally sums both predictions to get the global response.
In this paper, we focus on learning and forecasting the trends in timeseries via neuralnetworks. This involves learn- ing different aspects of the data. On the one hand, the trend variation of timeseries is a sequence of historical trends car- rying the long-term contextual information of timeseries and naturally affects the evolution of the following trend. On the other hand, the recent raw data points of timeseries [Wang et al., 2011], which represent the local behaviour of timeseries, affects the evolving of the following trend as well and have particular predictive power for abruptly changing trends. For instance, in Figure 2(b), trend 1, 2 and 3 present a continu- ous upward pattern corresponding to the timeseries before the prediction time instant. Then when we aim at predicting the subsequent trend of timeseries, the previous three succes- sive upward trends outline a probable increasing trend after- wards. However, the local data points around the end of the third trend as is shown in Figure 2(a), e.g., data points in the red circle, indicate that timeseries could stabilize and even decrease. The true data after the third trend indeed present a decreasing trend indicated by the blue dotted segment. In this case, the subsequent trend has more dependency on the local data points. Therefore, it is highly desired to develop a systematic way to model such hidden and complementary dependencies in timeseries.