Develop Tourism Demand Forecasting Model using Artificial Neural Network and ARIMA
Abdul Talib Bon, Chan Seck Tieng
Department of Production and Operations Management, Universiti Tun Hussein Onn Malaysia
Parit Raja, Johor, Malaysia
Department of Production and Operations Management, Universiti Tun Hussein Onn Malaysia
Parit Raja, Johor, Malaysia
Abstract
Tourism demand plays a significant role in development of tourism sector. As Malaysia is a multiracial country with a rich heritage, it consists of 13 states and three federal territories. Malaysia is presenting an average 26.13 millions of tourist arrivals with the average RM69.84 billions of tourist receipts. The performance of Malaysia tourism sector is in prospect with the improvement of every field which is related to tourism sector. Therefore, to boost the improvement and development, the researcher proposes to identify the best forecasting technique for forecast the tourism demand and choose Sabah as the research place in this study because of its motley of cultures. On top of that, Sabah tourism is one of the major contributors to Sabah’s economy. There are two forecasting techniques will be applied which are ANN approach and ARIMA approach. The results will be compared for purpose to identify the best forecasting technique for forecast the tourism demand of Sabah, Malaysia. Besides, the interview conducted in order to comprehend the reasons that influence tourist arrivals in Sabah. Based on the results of the study, the researcher concluded that the Artificial Neural Network is the best forecasting technique for forecast the number of tourist arrivals in Sabah for the future. This information provided an outline for develop tourism policy to increase tourism demand and Sabah’s economic status. From this study, it will effectively give a boost to growth the nation’s economy indirectly
Keywords— Tourism demand, Forecasting Model, ARIMA, ANN, Sabah
I. INTRODUCTION
Tourism is one of the major business sectors and it is representing 7% of the total world exports.
Meanwhile, tourism is also a growing market (Höpken et al., 2017). The tourism industry prospers on information (Benckendorff et al., 2014). Aruna (2013) says that “tourism has proven to be a catalyst for development and these dynamics have turned the industry into a key driver for economic growth”.
There is a lot of effort that worked by the Ministry of Tourism and Culture Malaysia as well as the local citizens in order to boost the tourism sector and nation culture to the worldwide. The government is trying to put greater effort in order to boost the tourism field by increasing the yield per tourist to attract the higher yield segment (Aruna, 2013). The main element that has to concern and take note is the tourism demand of the country. In order to comply the effort, the Tourist Development Corporation of Malaysia (TDC) was founded on 10 August 1972. It was an organization that under the prior Ministry of Trade and Industry.
A few years later, on 20 May 1987, with the establishing of the Ministry of Culture, Arts and Tourism, TDC was relocated to this new ministry; and soon, it through the Malaysia Tourism Promotion Board Act 1992 to become the Malaysia Tourism Promotion Board (MTPM). It is fully focusing on promoting Malaysia nationally and globally (Tourism Malaysia, 2017).
In addition, tourism demand forecasting or predicting is a well-developed research area. Indeed, it has been many studies in the tourism and hospitality field (Li et al., 2016). There are a few forecast
techniques that can be selected to adopt into the prediction. Among the forecast techniques are time series, econometric models, artificial intelligence approaches, and hybrid methods (Li et al., 2016). In this study, the researcher will use to explore the forecasts for the tourism demand in order to develop the tourism demand forecasting model as well as to compare the forecasting outcomes that developed by using the different methods. The two methods that will be used in this study are autoregressive integrated moving average and artificial neural network. The intention of this study is to explore the most accurate model for forecasting the tourism demand in Sabah. In this regard, this will assist to boost the economic status in Sabah in the near future.
II. METHODOLOGY A. Autoregressive Integrated Moving Average (ARIMA)
The Box-Jenkins model namely autoregressive integrated moving average (ARIMA) model is the most common time series technique to be used in the prediction. There are four stages that involved in the model, which consist of identification, estimation, diagnostic checking and forecasting.
Figure 1 The four steps in model-building strategy Source: (John E. & Dean, 2014).
The full model of ARIMA can be written as
where is a differenced series and it may have been differenced more than once. The “predictors”
present on the right side of the equation consist of both lagged values of as well as the lagged errors. The analysts claimed and this donated by ARIMA ( , , ) model, whereas the term integrated means the differences must be integrated to get the original series, where
= order of the autoregressive part;
= degree of the first differencing involved;
= order of the moving average part.
Figure 1 demonstrates the four steps in the Box-Jenkins technique of model-building strategy. Model identification as the first step of the process. Generally, the first step is to determine the types of series- whether the series is stationary or nonstationary. The stationary time series will present to change the fixed level. Meanwhile, the nonstationary time series will indicate an increase or decline over time as well as sample autocorrelations fail to eliminate promptly. The same stationary and invertibility requirements that are used for autoregressive and moving average models apply to this ARIMA model. A th-order autoregressive model (AR model) takes the form
where
= the response (dependent) variable at time = response variable at time lags = coefficients to be estimated
= error term at time , which signifies the effects of variables not explained by the model;
the assumptions about the error term are the same as those for the standard regression model Whereas the th-order moving average model (MA model) takes the form
where
= the response (dependent) variable at time
= the constant mean of the process
= the coefficients to be estimated
= error term at time , which signifies the effects of variables not explained by the model;
the assumptions about the error term are the same as those for the standard regression model = the errors in previous time periods that, at time , are incorporated in the response,
The step followed up is the model estimation as known as parameters estimation. The factors in ARIMA models are predicted by diminishing the total of squares of the fitting errors. The purpose of this step is to acquire the least squares by using a nonlinear least square procedure. The function of a nonlinear least square procedure is to predigest an algorithm by discovers the minimum of the total of squared errors function. When the squared errors figure as well as its standard errors are identified, values can be formed and interpreted in the common method. The equation of nonlinear least square estimation that will interpret in this step is
where
= the forecast values
= the forecast constant mean of the process = the forecast coefficients to be estimated
= the errors in previous time periods that, at time , are incorporated in the response, The residual mean square error as an estimate of the variance of the error, , is computed. The residual mean square error is useful for assessing fit and comparing different models. Also, it functions to compute the forecast error limits. The residual mean square error is defined as below
where
= the residual at time = the number of residuals
= the total number of parameters estimated
The third step is the model checking process with analysing the residual. it must be checked carefully for the adequacy before using the model for conduct the forecasting. The residuals should be random.
In fact, the individual residual autocorrelations should be small and generally within of zero.
The model is inadequate if the significant residual autocorrelations at low lags or seasonal lags.
Therefore, a new or modified model should be selected. In addition, the residual autocorrelations as a group should be consistent with those produced by random errors.
There is an overall check of model adequacy is offered by a chi-square ( ) test based on the Ljung-Box Q statistics. The test statistics Q is
= the residual autocorrelation at lag k = the number of residuals
= the time lags
= the number of time lags to be tested
In case the -value associated with the statistic is small as -value , the model is considered inadequate. Therefore, the analyst should consider a new or modified model and continue the analysis until a satisfactory model has been determined.
The final step in this process is forecasting with the model. Calculating forecasts and prediction intervals is tedious and best left to the computer. The same ARIMA model can be used to generate revised forecasts from another time origin since more data become available. In case the pattern of the series appears to be changing over time, the new data may be used to redo the estimation of the model parameters with re-conduct the entirely model by develop a new model (John E. & Dean, 2014).
B. Artificial Neural Network (ANN)
Artificial neural networks consist the simplest networks with no hidden layers as well as equivalent to linear regression. there is a linear regression with four predictors in the neural network version. The “weights” as the coefficients which attached to the predictors. The prediction is received through a linear combination of the inputs. the weights are chosen in the neural network framework by applying a “learning algorithm” that keep down a “cost function” such as MSE.
Figure 1 A simple neural network equivalent to a linear regression Source: (Rob & George , 2013)
The multilayer feed-forward network involved an intermediate layer with hidden neurons.
Therefore, the neural network becomes non-linear. The multilayer feed-forward network where each layer of nodes obtains the inputs from previous layers. The principle of the theory is the outputs of nodes in one layer are the inputs for the next layer. Whereas the inputs to each node are combined by using a weighted linear combination. Thus, the results are then modified by a nonlinear neuron .
Figure 3. A neural network with four inputs and one hidden layer with three hidden neurons
Source: (Rob & George , 2013) The inputs into hidden neuron in Figure 3 above are linearly combined to give
Whereas the hidden layer, this is the modified using nonlinear function that used in this process.
In order to give the inputs to next layer, the modified applying nonlinear function. This computation will help to reduce the effect of extreme input values (Rob & George , 2013).
III. RESULTSANDDISCUSSIONS
A. Data editing and coding
Sequel to data collection, the data was classified, coded and grouped in batches related to the dates for the preparing of the data for analysis. It is imperative to ensure the editing and coding of the data in order to certify omission, completeness and consistency of the data. Hence, any missing data has been considered as missing values. Coding (either considered pre-coding or post coding) was used to assign numbers for each data, which allow for easy transfer of the data to MATLAB. The coding procedure began with the establishing of a data file in Excel. The data editing procedures were undertaken to detect any errors in data entry, and out-of-range values were corrected by referring to the original data. The essence of this procedure is to enable a clean data could usable for the subsequent analysis.
Figure 4.1 ANN that using in forecasted the tourist arrivals from China to Malaysia
B. Results of China
Results of China Using Artificial Neural Network (ANN)
The researcher chose the data of tourists from China to develop this model dated January 2013 to December 2017 yielding 60 months. The researcher also used Artificial Neural Network to forecast the tourists vsitors in from China to Malaysia. However, the output layer of the ANN is a linear weighted sum. Thus, it is much faster and easier than to run the network as well as it has better result on nonlinear mapping.
The model below shows the actual as well as the forecasted number of tourist arrivals from China to Malaysia by using ANN.
Figure 4. The actual and forecasted movement of tourist arrivals from China to Sabah in the period of 2013 until 2017 using ANN approach
In the above model, the researcher used the number of tourist arrivals from China to Malaysia to forecast the future tourist arrivals to Malaysia in advance. According to the above result, the test performance was 41.3 while the train performance was 8.8 respectively.
Results of China using ARIMA
Figure 5 The actual and forecasted movement of tourist arrivals from China to Sabah in the period of 2013 until 2017 by using ARIMA approach
Combination of the Results of ANN and ARIMA of China
In this section, in order to acquire the most accurate out the forecasted models, the researcher combined the two models and figured out the forecasted models for comparing to each other.
Figure 6. The actual and forecasted movement of tourist arrivals from China to Sabah in the period of 2013 until 2017 using ANN and ARIMA approach
The above figure shows different movement of the two models employed in this research. The actual data is expressed in blue colour meanwhile the forecasted model using ANN is indicated in red colour as well as the grey colour is represented ARIMA model. Axis-Y illustrated the number of tourist arrivals while the axis-X expressed the monthly period started from 2013 to 2017 which yielded 60 months.
Error Analysis
Table 1 The results of the error analysis of MAPE shown in ANN and ARIMA
The table above shows the results of the error analysis by employing an error analysis technique.
From the results of MAPE shows ANN is the most accurate with 0.001296% compare to the ARIMA with 0.10152% Based on the results of MAPE, ANN is the best model with the presentation of the finding of the minimum error.
Results of Norway
Results of Norway Using Artificial Neural Network (ANN)
Figure 7. The actual and forecasted movement of tourist arrivals from Norway to Sabah in the period of 2013 until 2017 using ANN approach
In the above model, the researcher used the data of tourist arrivals from Norway to Malaysia to forecast the future tourists to Malaysia. Axis-Y illustrated the number of tourist arrivals from Norway and the axis-X illustrated the monthly period started from the January of 2013 to December of 2017 which consist of 60 months. According to the above result, the test performance was 38.4 while the train performance was 9.5 respectively.
Results of Norway Using ARIMA
Fig. 8 The actual and forecasted movement of the tourist arrivals from Norway to Sabah in the period of 2013 until 2017 by using ARIMA approach
From the figure 8 above, the blue line illustrates the movement of the actual number of tourist arrivals from Norway to Malaysia meanwhile the red line illustrates the movement of the forecasted values by the ARIMA. The axis-Y represented the number of tourist arrivals of Norway meanwhile the axis-X represented the monthly period starting from 2013 until 2017 which yielded into sixty (60) months.
Combination of the Results of ANN and ARIMA of Norway
Figure 9. The actual and forecasted movement of tourist arrivals from Norway to Sabah in the period of 2013 until 2017 using ANN and ARIMA approach
The above figure shows different movement of the two (2) models employed in this research. The actual data is expressed in blue colour meanwhile the forecasted model using ANN is indicated in red colour as well as the grey colour is represented ARIMA model. Axis-Y illustrated the number of tourist arrivals while the axis-X expressed the monthly period started from 2013 to 2017 which yielded 60 months.
IV. CONCLUSION Error Analysis
Table 2 The results of the error analysis of MAPE shown in ANN and ARIMA
The table above shows the result of the error analysis by employing an error analysis technique. The results of MAPE expressed in ANN was the most accurate with 0.06755% compare with the result of ARIMA with 0.20065% Based on the results of MAPE, the ANN is presenting the most accurate
results with the working of analysis the minimum error. There was no doubt that the ANN is the best forecasting technique with the compare to ARIMA.
Summary of the Identification the Best Forecasting Technique
TABLE 3. The overall of the results of mean absolute percentage error analysis of two countries
Obviously, the results of ANN expressed more accurate results of minimum error compared to the ARIMA error results. The MAPE presented the ANN of tourist arrivals from China was 0.01296%
meanwhile the tourist arrivals from Norway was 0.06755%. On the other hand, the MAPE presented the ARIMA of tourist arrivals from China was 0.10152% as well as the tourist arrivals from Norway was 0.20065%.
Acknowledgment
This research is supported and sponsored by Registrar Office, Universiti Tun Hussein Onn Malaysia, Ministry of Education Malaysia
References
[1] Archana , S., & Prashant, P. (2015). River water quality modelling using artificial neural network technique. International Conference On Water Resources, Coastal And Ocean Engineering (ICWRCOE 2015). 4, pp. 1070-1077. India: Elsevier B. V.
[2] Aruna, P. (2013, August 15). Giving the tourism sector a boost. Retrieved from The Star Online:
https://www.thestar.com.my/news/nation/2013/08/15/giving-the-tourism-sector-a-boost/
[3] Biljana, P. (2017). Predicting tourism demand by A.R.I.M.A. models. Economic Research- Ekonomska Istraživanja, 30(1), 939-950.
[4] Bill, F. (2001, April). Towards a framework for tourism disaster management Author links open overlay panel. Tourism Management, 22(2), 135-147.
[5] Bon,A. T. and Chong Y. L. (2009). The Fundamental on Demand Forecasting in Inventory Management. Australian Journal of Basic and Applied Sciences, 3(4), pp. 3937-3943.
[6] Box, G., & Jenkins, G. (1976). Time series analysis: Forecasting and control. San Freancisco:
Holden-Day Inc.
[7] C.Blankenship, D. (2010). Applied Research and Evaluation Methods in Recreation.
[8] Cenk, Y. U., & Abdurrahman, F. (2017, May 30). Forecasting USDTRY rate by ARIMA method.
Cogent Economics & Finance, 1-11.
[9] Daniel, T., & Saša, S. (2017, June 9). Forecasting Capacity of ARIMA Models; A Study on Croatian Industrial Production and its Sub-sectors. Zagreb International Review of Economics and Business, 20(1), 81-99.
[10] Dudovskiy, J. (2017). Research-methodology. Retrieved from Research- methodology.net:https://research-methodology.net/research-methods/qualitative-
research/interviews/
[11] Dorota, D., & Janusz, D. (2017). Application of arima models in real estate market forecasting.
GIS ODYSSEY 2017 (pp. 117-125). Vattaro, Italy: Geographic Information Systems Conference and Exhibition.
[12] Durbanville. (2012, October 26). What is primary data and secondary data in Statistics and research methods? Retrieved from eNotes: https://www.enotes.com/homework-help/what- primary-data-secondary-data-472774
[13] Freeman, J., & Skapura, D. (1991). Neural network algorithms, applications, and programming techniques. United States of America: Addison-Wesley Publishing Company.
[14] Han, H. C., & I, H. C. (2017, April). Tourism demand forecasting model using neural network.
International Journal of Computer Science & Information Technology (IJCSIT), 9(2), 19-29.
[15] Jennifer C., M. H. (2008). Forecasting Japanese Tourism in Taiwan. International Journal of Culture, Tourism and Hospitality Research, 2(3), 197-216.
[16] John E., H., & Dean, W. (2014). Introduction to forecasting. London: Pearson Education Limited.
[17] Levy, P., & S. Lemeshow (1991). Sampling of Populations: Methods and Applications. New York: John Wiley.
[18] Makridakis, S. (1986). The art and science of forecasting An assessment and future directions.
International Journal of Forecasting, 2(1), 15-39.
[19] Mertens, E. (2016). Measuring the level and uncertainty of trend inflation. Review of Economics and Statistics, 98(5), 950–967.
[20] Michael P., C. (2015, March 27). Are Professional Macroeconomic Forecasters Able To Do Better Than Forecasting Trends? Journal of Money, Credit and Banking, 47(2–3), 349–382.
[21] MY Tourism Data. (2018). Home. Retrieved from MY Tourism Data:
http://mytourismdata.tourism.gov.my/
[22] Peter, B. L. (1996). Against the Gods: The Remarkable Story of Risk. New York: getAbstract.
[23] Priya, C. (2011, November 8). Advantages of Demand fForecast for the Tourism Industry.
Retrieved from ProjectGuru: https://www.projectguru.in/publications/demand-forecast-tourism/
[24] RB, J. (2007). Towards a definition of mixed method research. J.Mix Method Res.
[25] Rob, J., & George , A.s.l. (2013). Neural network models. Retrieved from Forecasting: principles and practice: https://www.otexts.org/fpp/9/3
[26] Rob J., H., & Anne B., K. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 679-688.
[27] Rob, L., & Norman, A. (1999). A neural network model to forecast Japanese demand for travel to Hong Kong. Tourism Management, 20(1), 89-97.
[28] Sabah Tourist. (2010). Visit Sabah. Retrieved from Sabah Tourist:
http://www.sabahtourist.com.my/about-sabah/visit-sabah.html
[29] Shouliang, H., Zhuoshi, H., Jing, S., Beidou, X., & Chaowei, Z. (2013). Using artificial neural network models for eutrophication prediction. International Symposium on Environmental Science and Technology (2013 ISEST). 18, pp. 310-316. Bei Jing: Elsevier B.V.
[30] Song, H., & Li, G. (2008). Tourism demand modelling and forecasting- A review of recent research. Tourism Management, 29(2), 203-220.
[31] Sumru, A., & Çakmaklı, C. (2016). Forecasting inflation using survey expectations and target inflation: Evidence for Brazil and Turkey. International Journal of Forecasting, 32(1), 138-153.
[32] Tourism Malaysia. (2017). Retrieved from Tourism Malaysia:
https://www.tourism.gov.my/about-us/about-tourism-malaysia
[33] Ulmer JT, W. M. (2003). The potential contributions of quantitative research to symbolic interactionism. Symbolic Interact. 52.
[34] Valeria, C. (2016). Can tourism confidence index improve tourism demand forecasts? Journal of Tourism Futures, 2(1), 6-21.