CONSTRUCTING A SALES FORECASTING MODEL BY INTEGRATING GRA AND ELM:A CASE STUDY FOR RETAIL INDUSTRY

(1)

CONSTRUCTING A SALES FORECASTING MODEL BY

INTEGRATING GRA AND ELM：A CASE STUDY FOR

RETAIL INDUSTRY

Fei-Long Chen and Tsung-Yin Ou

*

Department of Industrial Engineering and Engineering Management

National Tsing Hua University

Hsinchu (300), Taiwan

ABSTRACT

Due to the strong competition and economic hardship, sales forecasting is a challenging problem as the demand fluctuation is influenced by many factors. A good forecasting model leads to improve the customers’ satisfaction, reduce destruction of fresh food, increase sales revenue and make production plan efficiently. In this study, the GELM forecasting model integrates Grey Relation Analysis (GRA) and extreme learning machine (ELM) to support purchasing decisions in the retail industry. GRA can sieve out the more influential factors from raw data and transforms them as the input data in a novel neural network such as ELM that can abandon the slow gradient-based learning speed and parameters tuned iteratively. The proposed system evaluated the real sales data of fresh food in the retail industry. The experimental results indicate the GELM model outperforms than other time series forecasting models, such as GARCH, GBPN and the GMFLN model in predicting accuracy and training speed. Otherwise, the different activation functions of the GELM model have significant differences in training time and performance during our experiments.

Keywords: Sales Forecasting, Grey Relation Analysis, Extreme Learning Machine, Retail Industry, Activation Functions

1. INTRODUCTION

*

In retail industry actual operations, sales forecasting plays a more and more prominent role as part of the commercial enterprise. However, sales forecasting is usually a highly complex problem due to the influence of internal and external factors. If decision-makers could estimate their sales quantities properly, the demands of customers would be satisfied and the cost of spoiled fresh food would be substantially reduced. Actually, the variations in consumers’ demand are caused by many factors like price, promotion, changing consumer preference or weather changes, especially in fresh food [34]. Both shortage and surplus of fresh items, which can only be sold for a limited period, would lead to loss revenue for the retail company. An effective and timely forecasting model is an urgent and indispensable tool for handling the inventory level in the retail business. On the other hand, poor forecasting methods would result in redundant or insufficient stock that will affect the income and competitive advantage directly. Therefore, it is a very

*_{Corresponding author: [email protected]}

critical issue to figure out the influential factors then obtain accurate forecasting results about the fresh food within a modern retail industry.

Since managers in retails usually lack an accurate forecasting tool, they have to rely on their own experience or consult the point of sales system (POS system) to predict the future demand and place purchasing orders. Few decision makers adopt statistical methods, such as the moving average method or exponential smoothing to deal with the time daily problems. LeVee [27] indicated that accurate sales forecasting was obtainable and that it can help the decision-makers to calculate the production and material costs and determine the sales price. In fact, most conventional sales forecasting methods used either factors or time series data to determine the sales prediction. The relationship between the past time series data (independent variables) and the sales prediction (dependent variable) is always too complicated to acquire an advantageous ordering suggestions by using the unsuited statistical approaches. Practically, the POS system actually provides some forecasting suggestions for the managers to place orders. However, most decision-makers still prefer to place the same quantity as usual or depend on their own

(2)

intuition instead of model-based approaches. In this paper, we present a relatively novel neural network methodology, Grey relation analysis integrated with extreme learning machine (GELM) to construct a forecasting model in the fresh food sector of the retail industry.

Sales in the retail sector exhibit strong seasonal variations. Historically, modeling and forecasting seasonal data is one of the major research efforts and many theoretical and heuristic methods have been developed in the last several decades. The available traditional quantitative approaches include heuristic methods such as time series decomposition and exponential smoothing as well as time series regression and autoregressive and integrated moving average (ARIMA) models that have formal statistical foundations [7]. Nevertheless, their forecasting ability is limited by their assumption of a linear behavior and thus, it is not always satisfactory [37]. Recently, artificial neural network (ANN) have been applied comprehensively in sales forecasting [17,31], pattern recognition [26], aggregate retail [7], PCB industry [11]. Most studies indicate that ANN have the better performance than conventional methodology [23,24]. This flexible data-driven modeling property has made the ANN model an attractive tool for many forecasting tasks. However, most ANN and its varieties used gradient-bases learning algorithms, such as back-propagation network (BPN), and faced many difficulties in stopping criteria, learning rate, learning epochs, over-tuning, local minima and long computing time. A new learning algorithm for single-hidden-layer feed-forward neural network (SLFN) called the extreme learning machine (ELM) has been proposed recently and overcome the previous disadvantages as we mentioned [18,19,30, 32,34].

The rest of this study will illustrate the GELM model for improving the accuracy of forecasting fresh foods in the retail industry. Section 2 reviews the related sales forecasting literatures including the traditional statistical model and the ANN model. Section 3 presents the methodology of this study in solving the real forecasting problems. Section 4 describes the development of various forecasting models and discusses the comparison results. Then the conclusion will be provided in Section 5.

2. LITERATURES REVIEW

The available traditional time series forecasting approaches are divided into two groups i.e. the univariate time series model and multivariate time series model. One of the major limitations of traditional statistical methods is that they are essentially linear methods. The sales status of fresh food is often influenced by uncertain factors such as

weather, promotion, competitive market, etc. Therefore, traditional methodologies require some improvements for providing better forecasting suggestions.

Next, we will briefly introduce the traditional statistical forecasting models and ANN model in sales forecasting applications.

2.1 Traditional Statistical Model for Time Series Data Forecasting

In the past several decades, many researchers had used many kinds of forecasting methods to study time series events. Univariate time series models include the moving average model, exponential smoothing model, and auto-regressive integrated moving average (ARIMA) model. Box and Jenkins [9] developed ARIMA, a basic principle of this model is the assumption of linearity among the variables. However, many time series events may not hold on the linearity assumption. Clearly, ARIMA models could not be effectively used to capture and explain non-linear relationships, especially for handling actual sales forecasting problems. When it is applied to processes that are non-linear, forecasting errors often increase greatly as the forecasting horizon becomes longer. For improving forecasting non-linear time series events, many researchers have developed alternative modeling approaches. These approaches include non-linear regression models, the bilinear model, the threshold auto-regressive model, the auto-regressive heteroscedastic model (ARCH) [16] and generalized auto-regression conditional heterskedasticity (GARCH) model [4].

Although the traditional methods have been proved somewhat effective, they still have certain shortcomings. Zhang [36] indicated that although these methods had displayed some improvements over the linear models in some specific cases, they tended to be applied to special events, and lacked generality and were poorly implement.

2.2 ANN Model in Time Series Data Forecasting

The ANN model is a model-free approach that was been recently applied in forecasting due to its competent performance in forecasting and pattern recognition. In general, it consists of a collection of simple non-linear computing elements whose inputs and outputs are tied together to form a network. Many studies have attempted to apply ANN model to time series forecasting. Weigend. et al. [35] introduced the ''weight-elimination'' back-propagation learning procedure and applied it to sunspots and exchange-rate time series. Tang and Han [33] compared the ANN model with the ARIMA model by using international airline passenger traffic, domestic car sales and foreign car sales in the USA. Chakraborty et al. [10] presented an ANN approach based on multivariate time-series analysis, which can

(3)

accurately predict the flour prices in three cities in the USA. Lachtermacher et al.[20] developed a calibrated ANN model. In this model, the Box-Jenkins methods are used to determine the lag components of the input data. Moreover, it employed a heuristics method to choose the number of hidden units.

Ansuj et al. [5] expressed a comparison made for the time series model with interventions related to the ANN model for analyzing the sales behavior of a medium-size enterprise. The results showed that the ANN model was more accurate. Furthermore, Bigus [7] used promotion, time of year, end of month age, and weekly sales as inputs for the ANN model to forecast the weekly demand with promising results. Kuo and Chen [22] believed that the traditional statistic approaches had higher performance dealing with data of seasonality and trends, but they are inappropriate for unexpected situations.

In the ELM method, the input weights (linking the input layer to the hidden layer) and hidden biases are randomly chosen and the output weights (linking the hidden layer to the output layer) are analytically determined by using the Moore-Penrose (MP) generalized inverse. As this new learning algorithm can be easily implemented, it tends to identify the smallest training error, obtains the smallest norm of weights and the good generalization performance, and runs extremely fast.

2.3 Demand Forecasting of the Retail Industry

Chu and Zhang [13] and Alon et al.[4] developed the artificial networks for forecasting the aggregate retail sales. Alon et al.[21] compared with traditional methods including Winter exponential smoothing, Box-Jenkins ARIMA model, and multivariate regression. The derivative analysis shows that the nonlinear neural networks model is able to capture the dynamic nonlinear trend and seasonal patterns, as well as the interactions between them. Chu et al.[7] found the non-linear models are able to outperform linear counterparts in out-of-sample forecasting, and prior seasonal adjustment of the data can significantly improve performance of the neural network model. The overall best model is the neural network built on deseasonalized time series data. Doganis et al. [15] also presented a evolutionary sales forecasting model which is a combination of two artificial intelligence technologies, namely the radial basis function and genetic algorithm. The methodology is applied successfully to sales data of fresh milk provided by a major manufacturing company of daily product. Aburto and Weber [1] presented a hybrid intelligent system combing ARIMA model and MLP neural networks for demand forecasting. It shows improvements in forecasting accuracy and a replenishment system for a Chilean supermarket, which leads simultaneously to fewer sales and lower

inventory levels. Au et al. [6] and Sun et al.[37] developed different sales forecasting models in fashion retailing. Au et al. [6] illustrated evolutionary neuron network for sales forecasting and showed that when guided with the BIC and the pre-search approach, the non-fully connected neuron network can converge faster and more accurate in forecasting for time series than the fully connected neuron network and traditional SARIMA model. Forecasting is often time crucial, the improvement of convergence speed makes widely applicable to decision-making problems. Sun et al [37] applied ELM neural network model to investigate the relationship between sales amount and some significant factors which affect demand. The experiment results demonstrate that the proposed methods outperform than back-propagation neural network model. Ali et al. [3] explored forecasting accuracy versus data and model complexity tradeoff in the grocery retailing sales forecasting problem, by considering a wide spectrum in data and technique complexity. The experiment results indicated that simple time series techniques perform very well for periods without promotions. However, for periods with promotions, regression trees with explicit features improve accuracy substantially. More sophisticated input is only beneficial when advanced techniques are used. Chen et al. [12] developed the GMFLN forecasting model by integrating GRA and MFLN neural networks. GRA sieves out the more influential factors from raw data then transforms them as the input data in the MFLN model. The experimental results indicated the proposed forecasting model outperforms than MA, ARIMA and GARCH forecasting model of the retail goods.

According to the above literature review, the retail forecasting problems are usually a time and accuracy crucial issue. This paper aims to construct a more efficiently sales forecasting model that could perform more accurate and faster than the univariate and multivariate time series model for retail goods. As we know, sales will be affected by many dynamic factors. GRA and the expert knowledge will sieve the more influential factors out as the input variables of the ELM model. Providing an improved forecasting method that can help the managers to make decisions for ordering the appropriate amounts will be the focal point of this research.

3. METHODOLOGY

The following section presents the purposed sales forecasting model by integrating GRA and ELM. The GRA computes the Grey Relation Grades (GRG), which are the influential degree of a compared series by relative distance. Subsequently, the data composed of these input and output pairs are divided into training, testing and predicting data. All

(4)

the data sets should be normalized into a specific range [-1,1]. The ELM would offer predicting results then process the unnormalization step, to convert the data back into unnormalized outcomes.

3.1 Grey Relation Analysis (GRA)

Deng [14] proposed the Grey Relation Analysis (GRA) mathematics. It has been successfully applied in many fields such as management, economics, and engineering. The Grey Relation Grades (GRG) is the influence degree of a compared series on the reference series that can be represented by the relative distance. The smaller distance would have more influence. The degree of influence describes the relative variations between two factors that indicate the magnitude and gradient in a given system. The GRG between two series, the compared series and the reference series, is called relational coefficient

)) ( ), ( (x0 k x k

r i . Before calculating the Grey relational

coefficients, each data series must be normalized by dividing the respective data from the original series with their averages.

After performing Grey data processing, the transformed reference sequence is x0={x0(1), x0(2),...,

x0(n)}. The compared sequences are denoted by

xi={xi(1), xi(2),…,xi(n)}, i=1 to m. The relational

coefficient r(x0(k),xi(k)) between the reference series x0(t) and the compared series xi(t) at time t=k

can be calculated using the following equation [20]:

 )) ( , (x0 x k r i | ) ( ) ( | max max | ) ( ) ( | | ) ( ) ( | max max | ) ( ) ( | min min 0 0 0 0 k x k x k x k x k x k x k x k x i k i i i k i i k i         k=1,2,…,m；i=1,2,…,m (1)

While  is a distinguishing coefficient (0< 1) that is used to adjust the range of the comparison environment and control level of differences in the relational coefficients. When

=1, the comparison environment is altered. When

 =0, the comparison environment disappears. In cases, when the data variation is large,  usually ranges from 0.1 to 0.5 for reducing the influence of extremely large mini mink |x0(k)xi(k)|.

where |x0(k)xi(k)| denotes the absolute difference between the two sequences, which represent the distance  x0i(k) after data transformation is the minimum (maximum) distance for the time in all compared sequences which form the comparison environment. While mini mink |x0(k)-xi(k)| equals zero since the transformed series will crisscross at a certain point.

|] ) ( ) ( | max [max | ) ( ) ( | max maxi k x0 k xi k i k x0 k xi k (2)

3.2 Normalization and Unnormalization

The normalized method for the input and output data set is described as follows:

}) { } { ( }) { ( } { ij ij ij ij ij ij normalize X Min X Max X Min X X Max X X      N j n i1,2,..., ; 1,2,..., (3)

The unnormalized method for the predicting result is described as follows:

 e unnormaliz P 2 } { } { }) { } { ( ij ij ij ij

ij Max X Min X Max X Min X

P   

N j

n

i1,2,..., ; 1,2,..., (4)

3.3 Extreme Learning Machine (ELM)

ELM is a single hidden-layer feed-forward neural network (SLFN). It randomly chooses the input weight matrix W and analytically determines the output weight matrix of SLFN. Suppose that we are training a SLFN with K hidden neurons and an activation function vectors g(x)[g1(x),g2(x),...,

)] (x

gk to learn N distinct samples (xi,ti), where

n T in i i i x x x R x [ 1, 2,..., ]  and ti [ti1,ti2,...tim]TRm.

If the SLFN can approximate the N samples with a zero error then we have



  

N

j1||yj tj|| 0 (5)

Where y is the actual output value of the SLFN. There also exist parameters i, wi and bi

such that



    K i j i j i i ig w x b t 1 ) (  j1,2,...,N (6) Where T im i i i w w w w [ ₁, ₂,..., ] is the weight

vector connecting the ith hidden node and the input

nodes, T

im i i

i [1,2,..., ]

  is the weight vector

connecting the ith hidden node and the output node, and bi is the threshold of the ith hidden node. The

operationwixj in Equation (6) denotes the inner

product of wi and xj. The above N equations can

be written compactly as: T H Where  ) ,... , ,..., , ,..., ( 1 ~ 1 ~ 1 _N N N b b x x w w H

(5)

                    ) ( ) ( ) ( ) ( ~ ~ ~ ~ 1 1 1 1 1 1 N N N N N N b x w g b x w g b x w g b x w g      m N T N T               ~ ~ 1    _ and m N T N T t t T             _ 1 (7)

In ELM, the input weights and hidden biases are randomly generated instead of tuned. Thus the determination of the output weights is as simple as finding the least-square (LS) solution to the linear is

†

ˆ__H

 (8)

where _H†_{is the MP generalized inverse of}

the matrix H. The minimum norm LS solution is unique and has the smallest norm among all the LS solutions.

3.4 Steps of Constructing the GELM Forecasting Model

This section will describe how to constructing a Grey relation analysis and Extreme Learning Machine (GELM) forecasting model systematically. The basic elements of the present study are presented in Figure1, and can be briefly described as follows:

Step 1: Data collection

Collect the daily sales and price data from the target store and the other relative series data provided by neighboring stores or some government agencies as the forecasting references. One of the data is the forecasting target (x0X), and the other is the m comparison series data(x0X,i1,2,...,m) where

} ,..., 2 , 1 | { (X  x_  m

_.

Step 2: Normalize the initial data

All initial data is composed of a moving window of fixed length along with the series and the input data will be normalized by Equation (3). After normalizing all the collected data, each data set will fall into the interval between -1 and 1.

Step 3: Calculate the grey relation grades (GRG)

The grey relational grades between the two series at a certain time point t is represented by grey relational coefficient r(x0(k),xi(k)) , define as Equation (1). The range of GRG is closed interval between 0 and 1. The great GRG between two data sets, the closer the relationship between these data sets are.

Step 4: Select the more affective factors

According to the ranking of the GRG, an expert who owns the domain knowledge can select the important factors that affect the sales amounts more significant. Step 2 and Step 3 not only can provide a

rational analysis but also avoid the preconceived opinions of experts.

Step 5: Divide the input and output data into training data, testing data and predicting data

The ELM is a SLFN with three main layers, input layers, hidden layers and output layers. Different from traditional learning algorithms the proposed learning algorithm tends to reach the smallest training error and obtains the smallest norm of weights. The ELM can be summarized as follows:

Algorithm ELM: Given a training set

} , | ) , {( m i n i i i t x R t R x N   , activation function

g(x), and hidden node number N~,

Step 5.1 Randomly assign input weight wi and

bias bi, i N ~ ,..., 2 , 1  .

Step 5.2 Calculate the hidden layer output matrix

H.

Step 5.3 Calculate the output weight .

T H†   where T N t t T [₁,..., ] †

H is the MP generalized inverse of matrix H (See Appendix)

Step 6: Select different activation functions and neuron number of hidden nodes

The ELM randomly chooses hidden nodes and analytically determines the output weights. There are three activation functions (sigmoidal、sine、hardlim) and four kinds of hidden node (20、50、100、200) can be select.

Step 7: Input training and testing data and predict the further sales amounts

Obtain the predicted results of training and testing data then unnormalize the outcomes by Equation (4)

As discussed in the Appendix, we have the following important properties:

1. Minimum training error

The special solution _ _H†T_{is one of the} least-square solution of a general linear system

T

H  , meaning that the smallest training error can be reached by this special solution:

T H T T HH T Hˆ  †  min_  ₍₉₎

Although almost all learning algorithms wish to reach the minimum training error, however, most of them cannot reach it because of local minimum or infinite training iteration is usually not allowed in applications.

2. Smallest norm of weight

Further, the special solution _ˆ_H†T_{has the} smallest norm among all the least-square solutions of

T H  :

(6)

 ˆ  H†T  _, N N R z T Hz T H _ _ _ _ _    {:  , (10)

The minimum norm least-square solutions of T

H  is unique, which is _ˆ_H†T

Step 8: Measure the accuracy of the forecasting results

Measure the accuracy of the forecasting results by MAD and MSE criterion.

1. MSE (Mean Square Error)

MSE＝ 1 ) ( 1 2  



 N F A N t t t ₍₁₁₎

2. MAD (Mean absolute deviation)

MAD＝



N_ 

t At Ft

N 1

1

(12) Where At is the actual amount and Ft is the

forecasting amount, respectively.

Step 9: Repeat step 6~8 for the same data

The GELM model will offer the best predicted results then measure the accuracy of those results. We will do some statistical tests (paired t-test) on obtained results of the “sigmoidal activation function”, “sine activation function” and “hardlim activation function”.

Figure1: Outline of present study

Figure 2 shows the framework and non-linear transformation of the GELM network that incorporates input layers, hidden layers and output layers. Generally, GELM model in practical

applications will divide the raw data into a training set and a testing set. The training set is used for neural network construction while the test set is used for measuring the model predictive ability. The training process for determining the function is determined by using the linking arc weights of the network. The structural size of the GELM model depends on the number of hidden nodes. The input data should provide strong representative after the GRA and opinions according to expert knowledge.

Figure 2: The framework and non-linear transformation of the GELM network

3.5 The Properties of the GELM Forecasting Model of the Retail Industry

The GELM forecasting model combined Grey relation analysis (GRA) and Extreme learning machine (ELM) methodologies. The GRA in the grey system is an important problem-solving method that is used when dealing with the similarity measures of complex relations. The main purpose of GRA in the proposed hybrid-forecasting model is to realize the relationship between two sets of time series data in relational space [25]. The Grey relational grade (GRG) is a globalized measure adopted for GRA. It is used to describe and explain the relation between two sets. If the data for the two sets at all individual time points were the same, then all the relational coefficients would equal one. The great GRG between two sets, the closer the relationship between the sets are. The higher GRG of the candidate data sets would be the delegates as the input data sets of the GELM model for enhancing the predict ability.

Owing to the learning speed of the feedforward neural network is far slower than required and it has been a major bottleneck in practical application for past decades. This study applied ELM for single-hidden layer feed-forward neural networks that randomly chooses hidden nodes and analytically determines the output weights of the networks. The major property of the ELM can abandon the slow gradient-based learning speed and parameters tuned iteratively algorithms that are extensively used to train neural network then provide good generalization forecasting performance at extremely fast learning speed.

The limitation of the purposed GELM forecasting model is that lacks to consider the influence of the financial crisis, free trade agreement,

(7)

consumers’ behavior and advertisements. Besides, it is more suitable regarding the mature product's forecast but not the new announcement product on the market.

4. EXPERIMENT RESULTS

AND DISCUSSIONS

In operational management in the retail industry, it is indispensable to forecast the further demand and place orders at various times of the day. If the system can offer more accurate prediction functions that can assist managers to cater for the demand of customers and reduce scraped quantities of fresh food. Using the GELM model to predict sales amounts can increase the accuracy in the proposed system. The procedures of the experiments and the results are described sequentially in the following subsections.

This study compared the GELM forecasting model with the multivariate statistical forecasting methods such as the GARCH model, the Back-Propagation Network (BPN) as well as the GBPN and the GMFLN model by forecasting 120 days sales. The GBPN model integrates the GRA and Back-Propagation Networks and the GMFLN model integrate the GRA and Multilayer Functional Link Networks. The GARCH model is built by E-view and the simulation of relative BPN model are conducted in MATLAB running on an ordinary notebook with a 1.4 GHz CPU and 760MB RAM.

4.1 Data Collection and Analysis

Well-known retailers and a government organization in Taiwan provided the initial data that can be separated into three different groups. Firstly, the target store collected the daily sales data and price of 960ml containers of milk. The total numbers of the data was 334 as shown in Figure 3. We also collected the sales amounts of other two different brands and their prices respectively. Ordinarily, the sales price would not be a fixed number, as it will be adjusted due to many reasons such as promotion, the hot/cold season or some specific activities. Secondly, the sales data was also obtained from two neighboring stores. Those neighboring stores are in the same distribution area. Stores were close to each other and they serviced the same customers. We also collected the sales amounts and price data from other stores. Thirdly, the Central Weather Bureau provided the local weather records.

Figure 3: The sale quantities of the target brand As we know, many factors will affect consumers’ behavior in the actual retail industry. Among those factors that would be described include how to select the most influential indices by using the analytical methodology to be the input data of the ELM model as below. After normalizing the raw data and calculating the GRG of each index. The expert selected three factors with higher GRG to be the input data of the multivariate time series model and ELM model. The GRG of each factor is shown in Table 1. The selected factors will represent the more influential in the sales amounts of fresh food. The three selected factors are W, TAs and TBS.

4.2 Experiment Results

The experimental algorithms of the GARCH, GBPN, GMFLM, and GELM import the same data sets including three indices (W, TAs and TBS )

selected by GRA and the last 7 days lagged data.

4.2.1 GARCH Forecasting Model

Bollerslev [8] proposed the GARCH (Generalized ARCH) conditional variance specification that allows for a parsimonious parameterization of the lag structure. In analyzing the time series model, several suitable models could explain the input data. We adopt two statistics to be the criterions for choosing the best statistical forecasting model.

1. AIC (Akaike’s Information Criterion)

Akaike [2] provided the following criterion to evaluate the fitness of the proposal statistical models. (Data set fitted by P parameters of the statistics models.)

AIC(P)=n_Ln(_ˆ_a2)_2P₍₁₃₎ 2. SBC (Schwartz’s Bayesian Criterion)

Schawrtz [28] provided the similar criterion to evaluate the fitness of the statistical models.

SBC(P)=_n _Ln(ˆ2) _P _Ln(_n)

a  

  (14)

The best GARCH forecasting model will use the same time series data and three indices (W, TAs

(8)

and TBS) to predict the 120 days demand. After

examining AIC(-0.75645) and SBC(-0.60712) the best adapted model is described below.

yt=0.83341yt-1-0.84852yt-2+0.79288yt-3+0.37308Wt

+0.06524TAs + 0.06403TBS-0.81732



t-1+

0.93919



t-2-0.69544



t-3+0.14682



t-5+



t



t~N(0,



2t)

 2t＝0.02877-0.09753 2t-2 (15)

Table 1: The GRG of collected data

I. Data from target store II. Data from neighboring two stores

Target Brand GRG Target Brand in Neighboring A GRG

Sales amount TS － Sales amount TAS 0.7560

Price TP 0.6232 Price TAp 0.6843

Competitive Brand 1 Competitive Brand 1in Neighboring A

Sales amount C1S 0.6786 Sales amount C1AS 0.7109

Price C1P 0.6935 Price C1Ap 0.6233

Competitive Brand 2 Competitive Brand 2 in Neighboring A

Sales amount C2S 0.6716 Sales amount C1BS 0.7322

Price C2P 0.6896 Price C1Bp 0.6571

Target Brand in Neighboring B

III. Weather data Sales amount TBS 0.7567

Weather records W 0.7737 Price TBp 0.6056

Competitive Brand 1in Neighboring B

Sales amount C1BS 0.6985

Price C1Bp 0.7012

Competitive Brand 2 in Neighboring B

Sales amount C2BS 0.7345

Price C2Bp 0.7021

4.2.2 GBPN Forecasting Model

Generally, the BPN is a typical type of artificial neural networks model, which is a class of generalized non-linear nonparametric model that was inspired by studies of the brain and nervous system. BPN is composed of several layers of input, hidden and output nodes. It is a challenge to develop appropriate size of BPN model for combining the available data in the training data and the testing data. The structure size of the model depends on the number of input nodes and the number of hidden nodes. There are no systematic reports on the decision of input and hidden nodes. Different input and hidden nodes have a significant impact on the learning and prediction ability of the network.

As mentioned before, the purpose of GRA is to realize the relationship between two sets of time series data in a relational space. In the GBPN model, the input nodes of the neural network are usually the past, lagged observations and more influential factors that will affect the sales amounts, and the output node is the real sales data. We expect to obtain an applicable GBPN forecasting model that has generalization and good forecasting capability.

4.2.3 GMFLN Forecasting Model

The MFLN incorporates basic input nodes, logarithmic input nodes, and exponential input nodes in the input layer for improving the forecasting ability and reducing the learning cycle time of the nervous

networks [12]. It is composed of one or two hidden layers that have competent continuous function in a theoretical time-series. In the analogous models, the hidden nodes are used to capture the non-linear structures. Making the decision for how many hidden nodes should be used is another difficult issue in the neural network forecasting model construction process. In practice, the numbers of hidden nodes were chosen through experiments or by trial-and-error without any theoretical basis to guide the decision.

Some theories suggest that more hidden nodes can increase the accuracy in approximating a functional relationship but it still causes the over-fitting problem. This problem is more likely to happen in the GMFLN model than in other statistical models. The over-fitting problem solution is to find a parsimonious model that fits the data well. Another way to tackle the over-fitting problem is to divide the time series into three sets; training, testing and validation [21]. The first two sets are used for model building and the last is used for model validation or evaluation. The best GMFLN model is the one that gives the best results in the predicting set.

4.2.4 GELM Forecasting Model

The learning speed of ELM is faster than other traditional classic gradient-based learning algorithms. This advantage has already been recognized in many further studies. In order to obtain higher prediction

(9)

accuracy, we designed the experiments with different activation functions for the number of hidden nodes. In the GELM forecasting model, we compare the accuracy with sigmoidal activation function, and hardlim activation function in the different numbers of hidden nodes. The numbers of hidden nodes are selected from the 20, 50, 100 and 200. The GELM forecasting model will use the same time series data

and three indices (W, TAs and TBS) to predict the

120 days demand.

Table 2 shows the training time of GELM, GBPN and GMFLM. The GELM learning algorithm spent 0.3705s CPU time with the Sigmoidal activation function and 200 hidden nodes. The traditional gradient-based learning algorithm as GBPN and GMFLN cost too much training time compared with GELM.

Table 2: Training time of different algorithms

GELM GBPN GMFLM Activation function Sigmoidal (Sig.) Sine (Sin.) Hardlim (Har.) 20 0.0100 0.0100 0.0100 50 0.0200 0.0200 0.0200 100 0.0801 0.0701 0.0801 Hidden nodes 200 0.3705 0.3805 0.3805 50 11.573 50 4.216

Table 3 shows the performance of the GELM forecasting model in different activation functions and hidden nodes. The more hidden node has a better ability to predict the sales amounts. The best forecasting results have MAD of 0.07039 and MSE of 0.00907 with sigmoidal activation function and 200 hidden nodes.

In the GELM model, the input weights and hidden biases are randomly chosen and the output weights are analytically determined by using the Moore-Penrose generalized inverse. In order to

compare training time and performance with different activation functions we tested 30 times each run and did some statistical tests (paired t-test) on obtained results to examine the statistically significant difference. The paired t-test is a widely used method to examine whether the average difference of performance between two methods over various data sets is significantly from zero. If the p-value generated by a paired t-test is lower than the significant level (0.05) that indicate the difference between the two methods.

Table 3: Performance of different activation functions and hidden nodes

Hidden nodes

Activation function Criterion 20 50 100 200

MAD 0.13192 0.13032 0.11908 0.07039

Sigmoidal (Sig.) _{MSE 0.03027}_0.02789_0.02221_0.00907

MAD 0.15315 0.14177 0.12049 0.07691

Sine (Sin.) _{MSE 0.04030 0.03113 0.02442 0.01000}

MAD 0.14310 0.12870 0.11234 0.07264

Hardlim (Har.)

MSE 0.03548 0.02581 0.02126 0.00998 Table 4 shows the training time of the GELM

with different activation functions and different hidden nodes. There is no significant difference when the numbers of hidden nodes are 20 and 50. When hidden nodes are 100, the training time of the hardlim activation function is significantly different between sigmoidal and sine activation function. But the sigmoidal and sine activation function have no difference. When hidden nodes are 200, these three activation functions have significant differences. The hardlim activation function is better than the sigmoidal activation function and the sigmoidal activation function is better than the sine activation function. Table 5 shows the MAD of GELM within different activation functions and different hidden

nodes. The p-value in ‘sigmoidal’ and ‘sine’ activation function is always lower than 0.05, which means there is a significant difference between these two activation functions and the performance of the sigmoidal activation function is always better than the sine activation function. The hidden nodes are 20, 50 and 100, the p-value in the sigmoidal and the hardlim activation function are lower than 0.05, the sigmoidal activation function is significantly better than hardlim activation function. The hidden nodes are 20, 100 and 200, the p-value in the sine and the hardlim activation function are lower than 0.05, the hardlim activation function is significantly better than sine activation function.

(10)

Table 4: Paired t-test of training time between different activation function Paired Differences 95% Confidence Interval of the Difference Hidden nodes Paired

Methods Mean StDev

Lower Upper t P values Sig.－Sin. 0.00020 0.00142 -0.00033 0.00073 0.77 0.448 Sig.－Har. 0.00000 0.00158 -0.00059 0.00059 0.00 1.000 20 Sin.－Har. -0.00020 0.00110 -0.00061 0.00021 -1.00 0.326 Sig.－Sin. -0.00000 0.00455 -0.00170 0.00170 0.00 1.000 Sig.－Har. 0.00000 0.00643 -0.00240 0.00240 0.00 1.000 50 Sin.－Har. 0.00000 0.00643 -0.00240 0.00240 0.00 1.000 Sig.－Sin. -0.00200 0.00610 -0.00428 0.00028 -1.80 0.083 Sig.－Har. 0.00400 0.00498 0.00214 0.00586 4.40 0.000* 100 Sin.－Har. 0.00600 0.00498 0.00414 0.00786 6.60 0.000* Sig.－Sin. -0.00090 0.01062 -0.01297 -0.00503 -4.64 0.000* Sig.－Har. 0.01702 0.01647 0.01087 0.02317 5.66 0.000* 200 Sin.－Har. 0.02602 0.01306 0.02115 0.03089 10.92 0.000*

Table 5: Paired t-test results of predicting between different activation function in MAD Paired Differences 95% Confidence Interval of the Difference Hidden nodes Paired

Methods Mean StDev

Lower Upper t P values Sig.－Sin. -0.00735 0.00756 -0.01017 -0.00452 -5.32 0.000* Sig.－Har. -0.00272 0.00435 -0.00434 -0.00110 -3.43 0.002* 20 Sin.－Har. 0.00463 0.00832 0.00152 0.00773 3.05 0.005* Sig.－Sin. -0.00538 0.00971 -0.00901 -0.00176 -3.04 0.005* Sig.－Har. -0.00413 0.00925 -0.00758 -0.00068 -2.45 0.021* 50 Sin.－Har. 0.00125 0.01081 -0.00278 0.00529 0.64 0.530 Sig.－Sin. -0.00837 0.00620 -0.01068 -0.00605 -7.40 0.000* Sig.－Har. -0.00378 0.00466 -0.00552 -0.00203 -4.43 0.000* 100 Sin.－Har. 0.00459 0.00737 0.00184 0.00733 3.42 0.002* Sig.－Sin. -0.00485 0.00532 -0.00684 -0.00286 -4.99 0.000* Sig.－Har. 0.01204 0.04376 -0.00430 0.02838 1.51 0.143 200 Sin.－Har. 0.01689 0.04478 0.00017 0.03361 2.07 0.048*

Table 6 shows the MSE of GELM within different activation functions and different hidden nodes. The p-value in ‘sigmoidal’ and ‘sine’ activation function are always lower than 0.05, which means there is a significant difference between these two activation functions and the performance of sigmoidal activation function is always better than sine activation function. The hidden nodes are 50 and 100, the p-value in ‘sigmoidal’ and ‘hardlim’ is are lower than 0.05, the sigmoidal activation function is significantly better than hardlim activation function. The hidden nodes are 20 and 200, the p-value in ‘sine’ and ‘hardlim’ activation function are lower than 0.05, the hardlim activation function is significantly better than sine activation function. From above results, the sigmoidal activation function has significant differences between the sine activation

function. But, there is no significant difference between ‘sigmoidal’ vs. ‘hardlim’ or ‘sine’ vs. ‘hardlim’.

4.3 Discussion

Table 7 presents the results of different forecasting models. The best GARCH model has MAD of 0.13876 and MSE of 0.03191. The best forecasting result of GBPN model has MAD of 0.09837 and MSE of 0.01979. The best forecasting result has MAD of 0.08911 and MSE of 0.01883. The best GELM model has MAD of 0.07039 and MSE of 0.00907. The GELM forecasting model we proposed has the smallest predicting errors and the learning speed is extremely faster than others.

(11)

Table 6: Paired t-test results of predicting between different activation function in MSE Paired Differences

95% Confidence Interval of the Difference Hidden

nodes Methods Paired Mean StDev

Lower Upper t P values Sig－Sin -0.00321 0.00316 -0.00439 -0.00203 -5.56 0.000* Sig－Har -0.00051 0.00174 -0.00116 0.00014 -1.62 0.116 20 Sin－Har 0.00269 0.00072 0.00122 0.00417 3.74 0.001* Sig－Sin -0.00184 0.00329 -0.00307 -0.00061 -3.07 0.005* Sig－Har -0.00176 0.00280 -0.00280 -0.00071 -3.43 0.002* 50 Sin－Har 0.00009 0.00375 -0.00131 0.00149 0.13 0.900 Sig－Sin -0.00213 0.00250 -0.00306 -0.00120 -4.67 0.000* Sig－Har -0.00138 0.00167 -0.00020 -0.00075 -4.52 0.000* 100 Sin－Har 0.00075 0.00246 -0.00016 0.00167 1.68 0.104 Sig－Sin -0.00083 0.00106 -0.00123 0.00044 -4.29 0.000* Sig－Har 0.00168 0.00578 -0.00047 0.00384 1.60 0.121 200 Sin－Har 0.00252 0.00597 0.00029 0.04750 2.31 0.028*

Table 7: The compared results of different forecasting models

Model Type MAD MSE Training Time

Statistical time series model GARCH 0.13876 0.03191 －

GBPN 0.09837 0.01979 11.573

GMFLN 0.08911 0.01883 4.216

Artificial neural network model

GELM 0.07039 0.00907 0.3705

5. CONCLUSIONS

Recently, many researches and industrial managers are interested in applying data mining and artificial intelligence algorithms to deal with routine problems. Sales forecasting plays a more and more important role in operating management of commercial enterprises especially in the retail idustry. In this paper, we present a relatively novel neural network methodology, Grey relation analysis integrated with extreme learning machine (GELM) to construct a forecasting model for fresh food. The proposed GELM model includes several major characteristics as following:

(1) This study applied GRA, which is a problem-solving method that used when dealing with similarity measures of complex relations. The main purpose of GRA in this model is to realize the relationship between two sets of time series data in the relational space and sieve out the more influential factors as the input data to the ELM.

(2) The learning speed of GELM is extremely fast than GBPN and GMFLN. The learning phase of GELM can be completed less than a second within different activate functions and hidden nodes.

(3) The proposed GELM has better generalization performance than the gradient-based algorithms such as GBPN and GMFLN.

(4) The GELM method can avoid many harmful issues that happened in the traditional

gradient-based algorithms, such as stopping criteria, local minima, improper learning rate and over-fitting problems.

(5) The GELM tends to reach the solutions straightforward without trivial issue and looks much simpler than most feed-forward neural networks algorithms.

The experiment results demonstrated the effectiveness of the GELM was superior to other forecasting models. In summary, this research would provide the following contributions in practical forecasting problems in the retail industry.

(1) Influential factor selections

The Grey relation analysis (GRA) is able to identify the appropriate factors for forecasting future values. These influential factors can elucidate and incorporate into the input data.

(2) Forecasting efficiency

The efficiency of GELM is better than other GBPN or GMFLN methods. For the demand of fresh food fluctuates usually, the faster learning speed can provide timely and frequent forecasting results for the manager’s reference. When hidden nodes are bigger, the learning speed of the hardlim activation function is better than the sigmoidal activation function and the sigmoidal activation function is better than the sine activation function.

(3) Forecasting performance

This research applies many forecasting models to be the compared benchmark. According to the results, the GELM model has the smallest MAD and MSE than GARCH, GBPN, and GMFLN models.

(12)

Therefore, GELM is a valid and effective forecasting tool that can be further applied in similar field for applications.

Examining the performance with different activation functions by a paired t-test, the sigmoidal activation function has significant differences with the sine activation function in MAD and MSE criterions.

In this paper, our experiments have successfully demonstrated the GELM can be well employed in sales forecasting for the retail industry. It not only provides smaller predicting errors but also improves the training speed more than other forecasting models. Future research will focus on the different temperature levels of fresh food in the retail industry and improve the stability and learning speed of the GELM model.

REFERENCES

1. Aburto, L. and Weber, R., 2007, “Improved supply chain management based on hybrid demand forecasts,” Applied Soft Computing, Vol. 7, No. 1, pp. 126-144.

2. Akaike, H., 1974, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, Vol. 19, No. 6, pp. 716-723.

3. Ali, Ö. G., Sayin, S., Woensel, T. V. and Fransoo, J., 2009, “SKU demand forecasting in the presence of promotions,” Expert Systems with Application, Vol. 36, No. 10, pp. 12340-12348.

4. Alon, I., Qi, M. and Sadowski, R. J., 2001, “Forecasting aggregate retail sales: A comparison of artificial neural networks and traditional methods,” Journal of Retailing and Consumer Services, Vol. 8, No. 3, pp. 147-156. 5. Ansuj, A. P., Camargo, M. E., Radharamanan,

R. and Petry, D. G., 1996, “Sales forecasting using time series and neural networks,” Computers and Industrial Engineering, Vol. 31, No. 1-2, pp. 421-424.

6. Au, K. F., Choi, T. M. and Yu, Y., 2008, “Fashion retail forecasting by evolutionary neural networks,” International Journal of Production Economics, Vol. 114, No. 2, pp. 615-630.

7. Bigus, J. P., 1996, Data Mining with Neural Networks: Solving Business Problems - From Application Development to Decision Support, McGraw-Hill, New York.

8. Bollerslev, T., 1986, “Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics, Vol. 31, No. 3, pp. 307-327.

9. Box, G. E. P. and Jenkins, G. M., 1976, “Time series analysis forecasting and control,” Management Science, Vol. 17, No. 4, pp. 141-164.

10. Chakraborty, K., Mehrotra, K. and Mohan, C. K., 1992, “Forecasting the behavior of multivariate time series using neural networks,” Neural Networks, Vol. 5, No. 6, pp. 961-970. 11. Chang, P. C. and Wang, Y. W, 2006, “Fuzzy

Delphi and back-propagation model for sales forecasting in PCB industry,” Expert Systems with Applications, Vol. 30, No. 4, pp. 715-726. 12. Chen, F. L. and Ou, T. Y., 2009, “Grey relation

analysis and multilayer function link network sales forecasting model for perishable food in convenience store,” Expert Systems with Application, Vol. 36, No. 3, pp. 7054-7063. 13. Chu, C. W. and Zhang, G. P., 2003, “A

comparative study of linear and nonlinear models for aggregate retail sales forecasting,” International Journal of Production Economics, Vol. 86, No. 3, pp. 217-231.

14. Deng, J. L., 1982, “Control problems of Grey systems,” System Control Letter, Vol. 1, No. 4, pp. 288-294.

15. Doganis, P., Alexandrids, A., Patrinos, P. and Sarimveis, H., 2006, “Time series sales forecasting for short shelf-life food products based on artificial neural networks and evolutionary computing,” Journal of Food Engineering, Vol. 75, No. 2, pp. 196-204. 16. Engle, R. F., 1982, “Autoregressive conditional

heteroskedasticity with estimates of the variance of U.K. inflation,” Econometrica, Vol. 50, No. 4, pp. 987-1008.

17. Frank, C., Garg, A., Sztandera, L. and Raheja, A., 2003, “Forecasting women’s apparel sales using mathematical modeling,” International Journal of Clothing Science and Technology, Vol. 15, No. 2, pp. 107-125.

18. Huang, G. B., 2003, “Learning capability and strong capacity of two-hidden-layer feedforward networks,” IEEE Transactions on Neural Networks, Vol. 14, No. 2, pp. 274-281. 19. Huang, G. B., Zhu, Q. Y. and Siew, C. K., 2006,

“Extreme learning machine: Theory and applications,” Neurocomputing, Vol. 70, No. 1-3, pp. 489-501.

20. Huang, S. T., Chiu, N. H. and Chen, L. W., 2008, “Integration of grey relational analysis with genetic algorithm for software effort estimation,” European Journal of Operational Research, Vol. 188, No. 3 , pp. 898-909. 21. Kaastra, I. and Boyd, M., 1996, “Designing a

neural network for forecasting financial and economic time series,” Neurocomputing, Vol. 10, No. 3, pp. 215-236.

22. Kuo, R. J. and Chen, J. A., 2004, “A decision support system for order selection in electronic commerce based on fuzzy neural network supported by real-coded genetic algorithm,” Expert Systems with Application, Vol. 26, No. 2, pp. 141-154.

(13)

based on fuzzy neural network with initial weights generated by genetic algorithm,” European Journal of Operational Research, Vol. 129, No. 3, pp. 496-517.

24. Lachtermacher, G. and Fuller, J. D., 1995, “Back-propagation in time series forecasting,” Journal of Forecasting, Vol. 14, No. 4, pp. 381-393.

25. Lai, H. H., Lin, Y. C. and Yeh, C. H., 2005, “Form design of product image using grey relational analysis and neural network models,” Computers & Operations Research, Vol. 32, No. 10, pp. 2689 -2711.

26. Leigh, W., Purvis, R. and Ragusa, J. M., 2002, “Forecasting the NYSE composite index with technical analysis, pattern recognizer, neural network, and genetic algorithm: A case study in romantic decision support,” Decision Support System, Vol. 32, No. 4, pp. 361-377.

27. LeVee, G. S., 1993, “The key to understanding the forecasting process,” Journal of Business Forecasting, Vol. 11, No. 4, pp. 12-16.

28. Schawrtz, G., 1978, “Estimating the dimension of a model,” Annals of Statistics, Vol. 6, No. 2, pp. 461-464.

29. Serre, D., 2002, Matrices: Theory and Applications, Springer, New York.

30. Sun, Z. L., Choi, T. M., Au, K. F. and Yu, Y., 2008, “Sales forecasting using extreme learning machine with applications in fashion retailing,” Decision Support Systems, Vol. 46, No. 1, pp. 411-419.

31. Sztandera, L. M., Frank, C. and Vemulapali, B., 2004, “Predicting women’s apparel sales by soft computing,” Lecture Notes in Artificial intelligence, Vol. 3070, pp. 1193-1198.

32. Tang, X. and Han, M., 2009, “Partial lanczos extreme learning machine for single output regression problems,” Neurocomputing, Vol. 72, No. 13-15, pp. 3066-3076.

33. Tang, Z., Almedia, C., and Fishwick, P. A., 1991, “Time series forecasting using neural networks vs. Box-Jenkins methodology,” Simulation, Vol. 57, No. 5, pp. 303-310.

34. Van der Vorst, J. G. A. J., Beulens, A. J. M., De Wit, W. and Van Beek, P., 1998, “Supply chain management in food chains: Improving

performance by reducing uncertainty,” International Transactions in Operation Research, Vol. 5, No. 6, pp. 487-499.

35. Weigend, A. S., Rumelhart, D. E. and Huberman, B. A., 1991, “Generalization by weight-elimination with application to forecasting,” Advances in Neural Information Processing Systems, Vol. 3, pp. 875-882. 36. Zhang, G. P., 2001, “An investigation of neural

networks for linear time-series forecasting,” Computers and Operations Research, Vol. 28, No. 12, pp. 183-202.

37. Zhang, G. P., 2003, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, Vol. 50, pp. 159-175.

ABOUT THE AUTHORS

Fei-Long Chen is a Professor of Industrial

Engineering and Engineering Management at National Tsing-Hua University (NTHU), Hsinchu, Taiwan. He received the B.S. degree in Industrial Engineering from National Tsing-Hua University, Taiwan, in 1982, and the M.S. and Ph.D degrees in Industrial Engineering from Aubrun University, USA, in 1988 and 1991. respectively. His currently research interests include statistical process control, total quality management, 6-sigma, engineering data analysis, enterprise integration, enterprise resource planning, and global logistics management. Currently he is temporally transferred to Liteon Corp. and serves as the Dean of IE Acodemy.

Tsung-Yin Ou is currently a Ph.D. candidate in the

Department of Industrial Engineering and Engineering Management at National Tsing Hua University, Taiwan. He received his B.S. degree at National Chiao Tung University and M.S. degree at Tunghai University in Tai Chung. He is also an engineer of IE Department in China Steel Corporation, Taiwan. His research interesting includes Data Mining, Operation Management and ERP.

(Received September 2009, revised December 2009, accepted December 2009)

(14)

APPENDIX

Appendix 1

Moore-penrose Generalized Inverse

The resolution of a general linear system y

Ax , where A may be singular and may even not be square, can be made very simple by the use of Moore-Penrose generalized inversed [29].

Definition 1: A matrix G of order nm is the Moore-penrose generalized inverse matrix A of order

n m , if GA GA AG AG G GAG A AGA_ _, _ _,₍ ₎T _ _,₍ ₎T _ ₍₁₄₎

For the sake of convenience, the Moore-Penrose generalized inverse matrix A will be denote by _A†

Appendix 2

Minimum Norm Least-square Solutions of General Linear System

For general linear system Axy, we say that xˆ is a least-square solutions if

y Ax y

x

Aˆ minx  where ||‧|| is a norm in

Euclidean space (15)

(15)

整合灰關聯分析及快速學習器建構銷售預測模式

零售業之實證研究

陳飛龍、歐宗殷

*

國立清華大學工業工程與工程管理學系

新竹市光復路二段

101 號

摘要

近來商業競爭激烈且經濟環境不佳，零售業如何在需求驟變的環境下進行銷售預測乃是為一大難題，好的銷售預測可以提高顧客滿意度、減少鮮食商品的報廢和提昇營業額並有利於制定生產計劃。本研究所提出的GELM銷售預測模式整合了灰關聯分析與快速學習器，並以零售業鮮食商品為主要驗證對象，目的在於提供零售業一個迅速且正確地預測模式，進而成為管理者的決策支援系統。整合灰關聯分析與快速學習器的主要因素在於，灰關聯分析可以在資訊不充分的條件下將影響銷售量的重要因子篩選出來，作為倒傳遞網路、多層函數連結網路以及快速學習器等類神經網路之輸入資料，而其中快速學習器的學習效率，經驗證後確實優於傳統的類神經網路演算法，因此得以建構出一個良好的預測模式。本研究利用零售業實際的銷售數據進行驗證，GELM模式所得的預測結果與學習速度均較GARCH、GBPN及GMFLN等預測模型為佳，此外，更進一步證實，在GELM預測模式中採用不同的活化函數(activation function)對於預測結果及訓練速度是有顯著差異。關鍵詞：銷售預測、灰關聯分析、快速學習器、零售業、活化函數（*聯絡人：[email protected]）