A Retail Demand Forecasting Model Based on Data Mining Techniques

(1)

A Retail Demand Forecasting Model Based on Data

Mining Techniques

İrem İşlek

Idea Teknoloji Çözümleri

Istanbul, Turkey [email protected]

Şule Gündüz Öğüdücü

Istanbul Technical University Department of Computer Engineering

Istanbul, Turkey [email protected]

Abstract—This paper addresses the problem of forecasting various product demands of main distribution warehouses. Demand forecasting is the activity of building forecasting models to estimate the quantity of a product that customers will purchase. It is affected from numerously different factors such as warehouse region size, customer count, product type etc. When the number of the distribution warehouses and products increases, it becomes considerably hard to estimate the demand of customers. In this study, we provide an appropriate methodology for demand forecasting which is capable of overcoming the aforementioned limitations while providing a high estimation accuracy. The proposed methodology clusters similar warehouses according to their sale behavior using bipartite graph clustering. After that, hybrid forecasting phase which combines moving average model and Bayesian Network machine learning algorithm is applied. Our experimental results on real data set show that this approach considerably improves the forecasting performance.

Index Terms—Bayesian networks, Bipartite graph, Bipartite graph clustering, Demand forecasting, Moving Average, Multilayer perceptron algorithm (MLP), Supply chain.

I.INTRODUCTİON

Demand forecasting which is the process of estimating the quantity of a product that customers will purchase is a research topic in machine learning. It is an important part of supply chain management. Supply chain can be thought as the network of organizations that are involved, through upstream and downstream linkages, in the different processes that produce value in the form of products or services delivered to the end consumer [1]. There are various types of supply chain for different environments. Supply chain models can be categorized into three main types [2] such as Direct Supply Chain, Extended Supply Chain and Ultimate Supply Chain. Graphical representations of these models can be seen in Fig. 1. In this study, we focus on demand forecasting models for Direct Supply Chain.

Detailed representation of Direct Supply Chain can be seen in Fig. 2. In direct supply chain management, the products of a manufacturer are sent to main distribution warehouses. Every main distribution warehouse may have a different number of

sub-distribution warehouses that distribute products to end sale points such as supermarkets, grocery stores, canteens etc.

Since both the number of warehouses and the variety of products increase in today’s competitive and dynamic business environment, accurate demand forecasting becomes more important. The improvement of the accuracy of demand forecasting yields significant savings for manufacturers. They can produce each product in sufficient amount that prevents unnecessary inventory costs. Besides, the manufacturers can buy adequate amount of supply materials and avoids redundant supply material costs.

Supplier Organization Customer

Supplier Organization Customer Ultimate Supplier ... ... Ultimate Customer Third Party Logistic Supplier Market Research Firm Financial Provider Direct Supply Chain

Supplier Organization Customer Supplier’s

Supplier ... ...

Customer’s Customer

Extended Supply Chain

Ultimate Supply Chain

Fig. 1. Types of supply chain.

In this study, we focus on building a demand forecasting model for main distribution warehouses of a company. Data for proposed study was taken from a national dried nuts and fruits company from Turkey. This company has nearly one hundred main distribution warehouses. The main distribution warehouses

(2)

contain sub-distribution warehouses which give service to end sale points. The company produces and distributes nearly seventy different products.

With the increasing number of the variety of products and size of the warehouses, it becomes more difficult to accurately estimate the demand with traditional methods. For this reason, most of the previously proposed methods are interested in limited count of warehouses and products. In addition to that, the estimation accuracy of these methods are not sufficient. In this paper, a new methodology which can handle numerous main distribution warehouses and products is proposed for demand forecasting. Sale amount of an item can be estimated for every main distribution warehouse. The proposed model based on data mining techniques is able to estimate the product demands accurately by considering various warehouse, product, shopper’s demographic and time attributes.

Supplier Manufacturer Main distribution warehouses . . . . . . Sub distribution warehouses En d S ale P o in ts Customers

Fig. 2. Main and sub distribution warehouses in direct supply chain.

Our overall model can be summarized as follows: we constructed a dataset from sales invoices of the company. After that we prepared data in order to apply data mining algorithms and calculated moving average values of product sale amounts. Then we clustered main distribution warehouses and sub distribution warehouses. Lastly we set a Bayesian Network model for demand forecasting.

The rest of the paper is organized as follows; a brief literature review in topics related to this paper is given in Section II. Section III gives background information for methodology. Section IV describes the overall details of the proposed methodology. Section V gives the results of the experiments and a discussion of these results. Finally, in Section VI, we conclude the paper and discuss the future work.

II.RELATED WORKS

Demand forecasting has been pointed as an important and a challenging problem for supply chain management [1]. For this reason, there have been several studies that applied data mining and machine learning techniques to solve this problem.

In some prior studies about demand forecasting, traditional statistical methodologies such as moving average, Box- Jenkins were used. Liu et. al. used data mining methodologies for time series and provided improvement in Box - Jenkins time series forecasting results [3].

Since statistical models could not give satisfying results, artificial intelligence algorithms were tried in numerous studies. For instance, Neural Network algorithms were commonly employed in the literature [4-9]. Given studies provided impressive results with NN algorithms. Hasin et. al. showed that ANN provides better results than traditional statistical methods such as Box- Jenkins model, Holt-Winter’s model [10].

Some subsequent studies combined ANN algorithm with another algorithm with the purpose of providing more successful methodologies. Doganis et. al. used genetic algorithm with RBF neural network algorithm [11]. Aburto and Weber proposed another hybrid model which combined Autoregressive Integrated Moving Average (ARIMA) model with neural network algorithm [12].

Because of the fact that ANN was a popular algorithm for demand forecasting, Efendigil et. al. compared Adaptive Neural Fuzzy Inference System (ANFIS) with ANN. In their study, ANFIS provided higher success than ANN [13].

Sun et. al. used several Extreme Learning Machines (ELM) in parallel to forecast sales amounts [5].

Data mining is used in some recent studies for providing more efficient methodologies in demand forecasting. Parikh proposed a data mining application for better demand forecasting and product allocations clustering [14]. Altıntaş and Trick used data mining methods for categorizing customer order distributions into data clusters [15].

Conducted studies in demand forecasting problem focus on different points of the supply chain. For instance, most of the studies are trying to forecast demand of one end sale point. In addition to that some of the studies have limited count of products to demand forecast. On the contrary, our problem contains larger number of main distribution warehouses and products compared to aforementioned studies. In this study, we benefited from data mining techniques for overcoming this problem.

III.BACKGROUND

This section provides the necessary background on the problem we want to solve. First, we briefly describe bipartite graph clustering. Then, we explain the classification method using Bayesian networks.

A.Bipartite Graph Clustering

In some applications, data can be represented as a bipartite graph structure G(X, Y, E). Bipartite graph is a special type of graph which the set of nodes (X, Y) represents two different type of objects and the set of edges (E) represents the relation between these objects. In bipartite graphs, same type of nodes can not have connection. There can be a connection only between different type of nodes. An example of bipartite graph representation can be seen at Fig. 3.

(3)

In this study, bipartite graphs are constructed to represent warehouse-item relations. A bipartite graph G = (X,Y,W) is obtained, where the set of nodes (X, Y) represent the warehouses and items, the set of edge weights (W) represents sale amount of item for the warehouse.

This bipartite graph partitioning method applied in this study tries to seperate a bipartite graph into two bipartite graphs, recursively. A vertex partition of G(X, Y, W) denoted by Π(A, B) is defined by a partition of the vertex sets X and Y , respectively: X = A ∪Ac_{, and Y = B}_∪_Bc_{as can be seen in Fig.} 3. In this partition, A pairs with B, Ac_{pairs with B}c_.

Fig. 3. Bipartite graph clustering.

On the purpose of spliting graph into clusters, it is searched a partition (Ncut) using Eq. 1 that the similarity between unmatched vertices is as small as possible.

𝑚𝑖𝑛𝜋(𝐴,𝐵)𝑁𝑐𝑢𝑡

(

𝐴, 𝐵

)

(1) Equation 2 which can be seen in below is used for calculation of Ncut(A, B).

𝑁𝑐𝑢𝑡(𝐴, 𝐵) = 𝑐𝑢𝑡(𝐴,𝐵)

𝑊(𝐴,𝑌)+𝑊(𝑋,𝐵)+

𝑐𝑢𝑡(𝐴𝑐_,𝐵𝑐₎

𝑊(𝐴𝑐_{,𝑌)+𝑊(𝑋,𝐵}𝑐₎ (2)

cut(A, B) can be calculated using Eq. 3 where W(A, Y) is the sum of the weights of edges with one endpoint in A and the other endpoint in Y.

𝑐𝑢𝑡(𝐴, 𝐵) = 𝑊(𝐴, 𝐵𝑐_{) + 𝑊(𝐴}𝑐_{, 𝐵)}₍₃₎ The reason of choosing this algorithm is that it works efficiently at clustering bipartite graphs compared to regular clustering algorithms such as k nearest neighbor [16].

B.Bayesian Network Algorithm

Bayesian Network is a simple, graphical representation for conditional independence assertions. In this graphical representation, every node of graph symbolize a random variable, where a random variable can take on possible values

from a random experiment. In addition to that, every edge between these nodes represents probabilistic dependencies among these random variables. Two random variables are said to be independent if the result of the second variable is not affected by the result of the first random variable.

Bayesian Networks can be used for numerous applications such as classification, regression, segmentation etc [17]. In our study, Bayesian network could represent the probabilistic relationships between demand forecasting results (sale amounts predictions of items) and various attributes such as moving average value, number of transportation vehicles of warehouse, location of warehouse etc. Given attributes, the network can be used to compute the probabilities of the presence of demand forecast results.

IV.DETAILS OF THE METHODOLOGY

The main purpose of the methodology is forecasting the sale amount of a specific item for a specific week and a specific main distribution warehouse. Basic steps of the purposed methodology can be seen in Fig. 4. The proposed method consists of four main stages: (1) In the first step, we prepare the data set obtained from a retailer in order to apply data mining algorithms; (2) For each product, we calculate its moving average value; (3) a bipartite clustering algorithm is applied in order to group warehouses and their sub-distributers that have similar sales behavior; (4) applying Bayesian Network to obtain forecasting results. The details of these steps are explained in this section.

Constructing dataset

Calculating moving average values

Constructing bipartite graph with

main warehouses

Clustering main warehouses using

bipartite graph

sub warehouses Clustering sub warehouses using bipartite graph Using machine learning algorithm Forecast Results Constructing dataset Calculating moving average values Constructing bipartite graph with

main warehouses

Clustering main warehouses using

bipartite graph

sub warehouses Clustering sub warehouses using bipartite graph Using machine learning algorithm Forecast Results 1 2 3

(4)

A.Constructing Dataset

First step of the methodology is preparing the dataset which includes necessary information for generating forecast results. The data was taken from a national dried nuts and fruits company from Turkey. We used sale invoices of 2011, 2012 and 2013 with the purpose of constructing a specialized dataset to be used in the experiments. The total numbers of warehouses and different products are ninety eight and seventy, respectively.

The dataset contains the following information about the warehouses and products:

Warehouse related attributes: location, size related attributes, such as number of sub-warehouses it has, number of transportation vehicles, total amount of weekly selling products, selling area in square meter, number of employees, number of customers.

Product related attributes: product category (in this study a product ontology is constructed), selling amount, selling time.

B.Data Preparation

An important step of a model based on data mining techniques is data preparation. In this step, the data is cleaned and prepared in order to apply data mining algorithms. In this study, we designed a product ontology not only for providing an effective way for interoperability with other systems but also for avoiding the cold start problem. The cold start problem also called new user or new item problem is the problem of estimating the demand of a new product. We used Protégé [18] tool for constructing the product ontology. All descriptor features of products were defined in detail. Obtained ontology had four main and twenty eight sub product categories. Nearly seventy different products were grouped according to defined product categories.

In this step, we also calculate moving average values of product sale amounts of past three weeks. The calculation of the moving average value for a specific week t can be seen at Eq. 4.

𝑀𝑜𝑣𝑖𝑛𝑔 𝑎𝑣𝑔(𝑡) = ∑ 𝑠𝑎𝑙𝑒 𝑎𝑚𝑜𝑢𝑛𝑡(𝑡 − 𝑖)

3 𝑖=1

3 (4)

C.Constructing bipartite graph and clustering warehouses

It was noticed that some of the main distribution warehouses show quite different sale behavior. For instance, warehouses which are in Istanbul have more sub distribution warehouses and they give service in wider area than regular warehouses. The warehouse features such as location and size also has an effect on the types of the products they sale. For example, a warehouse may sell higher profit margin products where the customers of another warehouse in a different location may prefer less expensive products. For this reason, it was decided grouping main distribution warehouses based on their product sale amounts by using a bipartite clustering method. A bipartite graph which includes two different types of nodes was constructed using all main warehouses and all products. Essential approach in this step is that, if a main warehouse sales a product, these two nodes have an edge. Also, weight of the

edge is total sale amount of the product for the main warehouse. A representative figure for main distribution warehouse – product bipartite graph can be seen in Fig. 5.

Bipartite graph clustering algorithm [16] is applied in order to group warehouses that have similar product sale behavior. This algorithm provides more performance on bipartite graph clustering than regular clustering algorithms such as K Nearest Neighbor. For this reason, bipartite graph clustering algorithm was chosen. This algorithm separates bipartite graph into two graphs recursively. In our study, twenty nine different main warehouse clusters were generated using bipartite graph clustering.

After that main warehouse clustering phase was completed, sub distribution warehouse clustering was started. Purpose of this step is that some sub warehouses of a main warehouse serve disparate regions which have different purchase power. Bipartite graph clustering algorithm was used for clustering sub warehouses of main warehouse likewise main distribution warehouses clustering step. Count of sub distribution warehouse clusters was 97. Main warehouse 1 Main warehouse 1 Main warehouse 1 Main warehouse 1 Main warehouse 1 Product 1 Product 2 Product 3 Product 4 Product 5 Product 6 1000 679 921 450 772 811 1041 996 798 1000 Fig. 5. Example of a bipartite graph of main warehouses and products.

D.Using machine learning algorithm

Last step of the proposed methodology is using a machine learning algorithm as can be seen in Fig. 4. Moving average values, warehouse related attributes and product related attributes were used to construct a Bayesian Network model. These attributes corresponded random variables in Bayesian Network. Forecast results were calculated based upon probabilities among these random variables of the network. Data of 2011 and 2012 were used for training Bayesian Network model while data of 2013 were used for testing.

In the first trial, we set a Bayesian Network model which handles all main distribution warehouses together. In the second trial, we set individual Bayesian Network models for main distribution warehouse clusters. Then, separate models were constructed for every sub warehouse cluster in third trial. Detailed results of given trials can be found in Section V.

(5)

In the phase of setting Bayesian Network model, we use moving average values of products. As we mentioned before, demand of a new product which does not have past sales data can be estimated owing to product ontology. In this case, nearest neighbors of the new product in ontology are determined and moving average values of neighbor products are used for estimating the quantity of new product that customers will purchase.

V.EXPERIMENTAL RESULTS

The evaluation metric which was used in measuring the error rate is MAPE. Equation 5 shows how to calculate MAPE value where At is actual value and Ft is forecasting value.

𝑀𝐴𝑃𝐸 = 100 𝑛 ∑ |𝐴𝑡 − 𝐹𝑡| |𝐴𝑡| 𝑛 𝑡=1 (5)

Moving average is one of the primitive forecasting methodologies as stand-alone. We calculated error rate of this method for our dataset and found 129 % with MAPE. This results showed us that primitive models for demand forecasting give insufficient results in complex structures.

The first trial which handled all main warehouses together had 49% error rate with MAPE using the hybrid model (moving average and Bayesian Network together).

Main warehouses were clustered due to their sale behavior in the second trial. This trail had 24% error rate. Error rate decreased considerably in this methodology because respective models applied to every main warehouse cluster.

TABLE I. RESULT TABLE

Error Rates With MAPE

One model for all main warehouses 49% Respective models for main distribution

warehouse clusters 24%

Respective models for sub distribution

warehouse clusters 17%

In the third trial, clustering was done to construct sub warehouse clusters. Error rate in this step found as 17%. Especially for main distribution warehouses which has numerous sub distribution warehouses, this trial provided more improvement. For instance, a specific main warehouse cluster had 37% error rate with second trial. When sub warehouses of this main warehouse clustered using third trial, the error rate dropped to 16%.

VI.CONCLUSION

This work presents an approach for demand forecasting which can handle numerous main distribution warehouses and products.

It was shown in this study that using one model for all main distribution warehouses gives unsatisfying results for multitudinous count of main distribution warehouses. Furthermore, clustering main distribution warehouses according

to sale amounts of per product and splitting forecasting models based on main warehouse clusters provided improvement in results.

If a main distribution warehouse had a larger area to serve and had more sub warehouses, clustering sub warehouses of this main warehouse provided better results. In other words, when separate models applied to sub warehouse clusters, error rate decreased. This situation comes from the fact that some sub warehouses of same main warehouse serve completely different regions which have distinctive purchasing power.

ACKNOWLEDGMENT

This research was supported by Ministry of Science, Industry and Technology of Turkey SANTEZ project 0484.STZ.2013-2.

REFERENCES

[1] M. L. Christopher, “Logistics and Supply Chain Management”, London: Pitman Publishing, 1992.

[2] J. T. Mentzer, W. DeWitt, J.S. Keebler, S. Min, N. W. Nix, C. D. Smith, Z. G. Zacharia, “Defining Supply Chain Management”, Journal of Business Logistics, vol. 22, no. 2, 2001.

[3] L.M. Liu, S. Bhattacharyya, S.L. Sclove, R. Chen, W. J. Lattyak, “Data Mining on Time Series: An Illustration Using Fast-Food Restaurant Franchise Data”, Computational Statistics & Data Analysis, vol. 37, pp. 455-476, 2001. [4] P.C. Chang, Y.W. Wang, C.H. Liu, “The Development of a

Weighted Evolving Fuzzy Neural Network for PCB Sales Forecasting”, Expert Systems with Applications, vol.32, pp. 86- 96, 2007.

[5] Z.L. Sun, T.M. Choi, K.F. AU, Y. Yu, “Sales Forecasting Using Extreme Learning Machine With Applications In Fashion Retailing”, Decision Support Systems, vol. 46, pp. 411-419, December 2008.

[6] Y. Yu, T. Choi, C. Hui, “An Intelligent Fast Sales Forecasting Model for Fashion Products”, Expert System with Applications, vol. 38, pp. 7373-7379, 2011.

[7] S.H. Ling, “Genetic Algorithm and Variable Neural Networks: Theory and Application”, Lambert Academic Publishing, 2010. [8] K.F. Au, T.M. Choi, Y. Yu, “Fashion Retail Forecasting by Evolutionary Neural Networks”, International Journal of Production Economics, vol. 114, pp.615-630, 2008.

[9] R.S. Gutierrez, A. Solis, S. Mukhopadhyay, “Lumpy Demand Forecasting Using Neural Networks”, International Journal of Production Economics, vol. 111, pp. 409-420, 2008.

[10]M.A.A. Hasin, S. Ghosh, M.A. Shareef, “An ANN Approach to Demand Forecasting in Retail Trade in Bangladesh”, International Journal of Trade, Economics and Finance, vol. 2, no. 2, April 2011.

[11]P. Doganis, A. Alexandridis, P. Patrinos, H. Sarimveis, “Time Series Sales Forecasting For Short Shelf-Life Food Products Based On Artificial Neural Networks And Evolutionary Computing”, Journal Of Food Engineering, vol. 75, pp. 196-204, 2006.

[12]L. Aburto, R. Weber, “Improved supply chain management based on hybrid demand forecasts”, Applied Soft Computing, 2007.

(6)

[13]T. Efendigil, S. Önüt, C. Kahraman, "A decision support system for demand forecasting with artificial neural networks and neuro-fuzzy models: A comparative analysis." Expert Systems with Applications, vol. 36, no. 3, pp. 6697-6707, 2009. [14]B. Parikh, “Applying Data Mining to Demand Forecasting and

Product Allocations”, The Pennsylvania State University, 2003. [15]N. Altintas, M. Trick. "A data mining approach to forecast behavior", Annals of Operations Research, vol. 216, no. 1, pp. 3-22, 2014.

[16]H. Zha, X. He, C. Ding, H. Simon, M. Gu, “Bipartite Graph Partitioning and Data Clustering”, CIKM’01, Atlanta, Georgia, USA, November 5-10, pp. 25-32, 2001.

[17] I. Ben-Gal,Bayesian Networks, In: F. Ruggeri, F. Faltin & R. Kenett (Eds.), Encyclopedia of Statistics in Quality and Reliability, John Wiley & Sons, 2007.

[18] Protégé, http://protege.stanford.edu.