• No results found

A COMPARISON OF REGRESSION MODELS FOR FORECASTING A CUMULATIVE VARIABLE

N/A
N/A
Protected

Academic year: 2021

Share "A COMPARISON OF REGRESSION MODELS FOR FORECASTING A CUMULATIVE VARIABLE"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

A COMPARISON OF REGRESSION MODELS FOR FORECASTING A CUMULATIVE VARIABLE

Joanne S. Utley, School of Business and Economics, North Carolina A&T State University, Greensboro, NC 27411, (336)-334-7656 (ext. 4026), [email protected]

J. Gaylord May, Department of Mathematics, Wake Forest University, Winston-Salem, NC 27109, (336)-758-5338, [email protected]

ABSTRACT

This paper examines the relative performance of four regression models for forecasting total demand when historical time series data for past sales and partial demand data for future orders are available. Two of the models were based on ordinary least squares (OLS) regression while the other two models used least absolute value (LAV) regression. Data from an actual manufacturing firm were used to test the models. The OLS model that utilized a demand ratio approach produced the most accurate forecasts for the planning horizon.

INTRODUCTION

As managers try to accurately predict demand in today’s complex business environment, they should consider using partial demand data in the forecast process. While many managers currently use historical time series data to mathematically forecast future demand, they often fail to exploit partial order data that are available to them. Past research has shown that quantitative models can be used to combine historical time series data for a product with data from advance customer orders to produce more accurate demand forecasts for a planning horizon. However, many of these quantitative models are complex, may require a level of expertise most managers do not possess and thus may be quite difficult to implement in practice. [2] [8] Needed are straightforward forecast models that allow a manager to use both historical demand data and advance order data to forecast total demand for future time periods. In response to this need, this paper will examine four such models which are based on simple linear regression.

This paper is organized as follows. The next section will provide an overview of the literature on the use of partial demand data for forecasting total demand. Particular emphasis will be given to the relative ease with which regression models can incorporate both historical demand data and advance order data in a single forecast model. Section three includes a case study in which four regression based models are applied to actual order data from a manufacturing shop. The paper concludes with a discussion of results of the study and suggestions for future research.

OVERVIEW OF THE LITERATURE

Although regression models for forecasting total demand with advance order data have been discussed in the literature for nearly twenty years, they constitute a relatively recent approach to forecasting a cumulative variable. The earliest models for forecasting demand with partially

(2)

accumulated order data were devised over forty years ago and utilized Bayesian solutions. (See, for example, [7] and [11]) Guerrero and Elizondo (1997) have observed that these early studies relied on a Bayesian approach because the amount of advance order data available to the forecasters was quite limited. Guerrero and Elizondo (1997) also noted that the Bayesian methods can prove difficult to implement in practice because they require a rather high level of mathematical expertise from the manager. Other researchers have voiced similar criticisms of complex forecasting techniques for cumulative variables and have proposed the use of simpler forecasting methods. [2] [9]

Two of the most straightforward and widely used techniques involve: 1) the multiplicative model, which was first discussed in the forecasting literature by Bestwick (1975) and 2) the additive model, which was compared to the multiplicative model by Kerke, Morton and Smunt (1990). The following notation will be useful in reviewing these two basic models and in formulating the regression models that represent a combination these two basic approaches. Regression Model Notation

Let:

t = a particular time period

L = the maximum customer designated lead time

h = a specific customer designated lead time where h < L.

D(t,h) = the partially accumulated demand for period t occurring h or more periods in advance of t. (or the sum of orders for period t for which customer supplied lead time > h)

D(t) = the total demand for period t = D(t,0) = the accumulated demand for period t occurring 0 or more periods in advance.

F(t) = forecast for total demand in period t

C(h) = the ratio of partially accumulated demand known h periods prior to period t to total demand for period t.

R(t) = D(t)/D(t-1) = the ratio of total demand for period t to total demand for period t-1. FR(t) = the forecast for the ratio of total demand in period t to total demand in period t-1.

A(t,h) = D(t,h)/D(t-1,h) = the ratio of partially accumulated demand for period t known h or more periods in advance of t to partially accumulated demand for period t-1 known h or more periods in advance of period t-1.

The basic multiplicative model states that the partially accumulated demand for period t occurring h or more periods in advance is the product of total demand for period t and the cumulative proportion C(h). Given this relationship, a forecast F(t) for total demand in period t can be found by dividing D(t,h) by C(h):

F(t) = D(t,h)/C(h) (1) For example, if a manager knows that h periods in advance of period t 25% of total demand for period t will already be known, and if the sum of advance orders for period t = 100 units, then the forecast for period t will be 100/.25 = 400 units. [9]

The simplest form of the multiplicative model assumes that the cumulative proportions remain constant. However, in practice, the C(h) values may drift over time. In this case, exponential

(3)

smoothing can be used to update the cumulative proportions and improve forecast accuracy. [3] Working with simulated booking and shipment data, Bodily and Freeman (1990) showed that the multiplicative model with smoothed C(h) values outperformed alternative models based on Bayesian analysis and also models based on smoothed shipments.

In contrast with the multiplicative model, the simple additive model assumes that total demand for period t is the sum of the known component of total demand and the unknown component of total demand. [9] If S(t,h) represents the smoothed value of this unknown component of total demand, then:

F(t) = S(t,h) + D(t,h) (2) It should be noted that the additive model does not assume that known portion of total demand provides information about the unknown component whereas the multiplicative model assumes proportional change in the known and unknown components. Kekre et al. (1990) studied the relative performance of these two models and found that the multiplicative model outperformed the additive model for one period ahead forecasts while the additive model was more accurate for a forecast horizon of 2-5 periods. Kekre et al. (1990) also noted the multiplicative model outperforms the additive model when there is a correlation between the known and unknown portions of total demand.

While both of these models are fairly easy to understand and implement in practice, they both fail to make direct use of the historical time series for total demand. [3] To improve the performance of two these models, Kekre et al. (1990) suggested combining them so that

F(t) = a + b • D(t,h) (3)

The values of a and b are estimated from historical data. They [9] noted that the multiplicative model assumes that a = 0 while the additive model assumes that b =1. Using Kekre et al.’s suggestion as a starting point, Guerrero et al. (1997) defined the following set of L simple linear regressions to model total demand as a function of partially accumulated demand for h = 1,2,…L:

D(t) = b0 + b1 D(t,h) + et (4)

While Guerrero et al. (1997) used OLS regression to forecast total demand, May and Sulek (2007) suggested using OLS regression to forecast total demand ratios. Since it is common in many business applications that demand for a particular time period increases or decreases by a percent of the previous period’s demand rather than by a fixed amount [1], the increases or decreases in the total demand ratio (or R(t) value) should be related to the increases or decreases in the partially accumulated demand ratio (or A(t,h) value), where R(t) = D(t)/D(t-1) and A(t,h) = D(t,h)/D(t-1,h). For a given h, a linear relationship between the partially accumulated demand ratio A(t,h) and the total demand ratio R(t) can be modeled via ordinary least squares (OLS) regression:

(4)

This model can be used to generate a forecast FR(t) for R(t) for a given h as long as partial demand data (i.e., D(t,h) and D(t-1, h) ) are available for periods t and t-1. Once the ratio forecast FR(t) is computed, F(t), the forecast for total demand in period t, can be found with the formula: F(t) = D(t-1)][FR(t)] (6) if h >1, or

F(t) = [F(t-1)][(FR(t)] (7) if h > k, for any designated lead time k > 1.

If the lead time is any value greater than 1, the actual total demand for period t-1 is not yet known and F(t-1) will replace D(t-1) in the forecast formula.

In contrast to OLS regression which minimizes the sum of the squares of the forecast errors, LAV regression minimizes the sum of the absolute values of the error terms. Past research in the forecasting literature which compared the relative accuracy of the OLS and LAV methods for small sample sizes has shown that the LAV technique is often the more robust technique in practice, particularly when normality assumptions are not met. [5] Given these findings, one could replace the OLS based approach in Guerrero et al.’s (1997) and May et al.’s (2007) regression models with an LAV approach. In the next section of this paper a case study in which all 4 types of regression models – OLS, LAV, OLS Ratio and LAV Ratio – were applied to actual manufacturing data.

CASE STUDY

The research setting for this study was an electronics component company located in the southeast United States. Although this company manufactured a variety of components, data for only one product will be used to compare the regression based forecasts. Nine months of historical data were available. The data included a requested delivery date (or customer designated lead time) for each customer order as well as the order quantity per customer. Customer designated lead times varied from 1 month to 4 months, although customers occasionally requested a lead time of 5 or 6 months. The customer’s order quantity and designated lead time varied with each order rather than remaining stable over time.

The manufacturer needed to forecast demand for this component for a 4 month planning horizon (months 10-13).The company wanted to use the partial order data it already possessed at the end of month 9 to generate these forecasts. The authors utilized 4 regression models to make use of both the partial data available for months 1-13 and the total demand data for months 1-9 in preparing the forecasts. The forecast process consisted of the following stages:

(5)

Total demand data from months 1-9 were used to compute R(t) = D(t)/D(t-1), the ratio of total demand in month t to total demand in month t-1. Eight such ratios were calculated. In addition, for each h (h =1,2,3,4), partial demand data from months 1-9 were used to calculate a time series of 8 partial demand ratios A(t,h) = D(t,h)/D(t-1,h). The 5 time series are shown in Table 1.

TABLE 1

Total Demand Ratios and Partial Demand Ratios Month (t) R(t) = D(t)/D(t-1) A(t,1) = D(t,1)/D(t-1,1) A(t,2) = D(t,2)/D(t-1,2) A(t,3) = D(t,3)/D(t-1,3) A(t,4) = D(t,4)/D(t-1,4) 2 .83 .831 .717 .857 .702 3 .98 .985 .969 1.09 1.72 4 .74 .754 .756 .697 .703 5 1.31 1.24 1.27 1.41 1.55 6 .76 .78 .775 .781 .89 7 .94 .9 .898 .843 .759 8 1.15 .965 .934 .71 .596 9 1.25 1.45 1.32 1.53 .589 10 .8 .816 .679 1.34 11 1.74 1.72 .901 12 .321 .281 13 .5

At the end of month 9 it was also possible to calculate some of the partial ratios corresponding to the planning horizon. Since partial data were available for months 10-13, it was possible to compute A(10, h) for all 4 values of h. Similarly, it was possible to calculate A(11, h) for h = 2,3,4 and A(12,h) for h = 3,4. Finally, A(13,h) could be calculated only for h = 4. These partial ratios are also shown in Table 1.

(6)

The authors considered four separate regression models for this forecasting problem. The first model was the standard OLS approach [8] which describes a linear relationship between the total demand D(t) and partial accumulated demand D(t,h) for each h (h=1,2,3,4). This model is given by equation (4). Partial demand data and total demand data for months 1-9 were used to establish this model. Table 2 lists the estimated regression lines generated by this approach as well as the corresponding error measures. As Table 2 shows, forecast accuracy (as measured by the MAD) decreases as h increases. This outcome should be expected since the partial demand data for the D(t,h) values become more limited as the length of the customer designated lead time increases.

TABLE 2

OLS Regression Results

The second model used Least Absolute Value (LAV) regression to estimate the relationship between total demand D(t) and partially accumulated demand D(t,h). Table 3 lists the estimated regression lines and associated error measures for this approach. Table 3 also shows that the accuracy of the individual regression models decreases as h increases.

TABLE 3

LAV Regression Results Dependent

Variable

Independent Variable

Intercept Beta MAD

R(t) A(t,1) 37.0814 .8937 10.008 R(t) A(t,2) 51.7565 .8551 14.7902 R(t) A(t,3) 102.0647 .7004 26.794 R(t) A(t,4) 243.122 -.0389 38.497 Dependent Variable Independent Variable

Intercept Beta MAD

R(t) A(t,1) 16.755 .972 8.73

R(t) A(t,2) 19.575 .973 12.21

R(t) A(t,3) 38.84 .918 25.312

(7)

The third and fourth models used OLS and LAV regression, respectively to model the relationship between R(t), the ratio of total demand for period t to total demand for period t-1 and A(t,h), the ratio of partially accumulated demand for period t known h or more periods in advance of t to partially accumulated demand for period t-1 known h or more periods in advance of period t-1. Table 4 and Table 5 list the estimated regression models and error measures for the OLS approach and LAV approach, respectively. As before, the accuracy of the individual regression models tends to diminish as the value of h increases.

TABLE 4

OLS Ratio Regression Results Dependent

Variable

Independent Variable

Intercept Beta MAD

R(t) A(t,1) .1884 .8163 .0751

R(t) A(t,2) .1466 .8885 .0637

R(t) A(t,3) .4778 .5225 .1046

R(t) A(t,4) .886 .1161 .1836

TABLE 5

LAV Ratio Regression Results

Stage 3: Calculation of the Non-Ratio Based Forecasts

For each h (h = 1,2,3,4), the non-ratio OLS and LAV models described in Table 2 and Table 3, respectively, were used to model the relationship between D(t) and D(t,h). The resulting forecasts for months 10-13 are presented in Table 6 and Table 7.

Dependent Variable

Independent Variable

Intercept Beta MAD

R(t) A(t,1) .021 .974 .067

R(t) A(t,2) .280 .735 .0706

R(t) A(t,3) .313 .612 .086

(8)

TABLE 6 Error Measures

OLS Ratio Model versus OLS Model

Month Actual Total

Demand OLS Ratio Forecast Error Terms OLS Ratio Model OLS Forecast Error Terms OLS Model 10 235 220 15 207 28 11 405 372 33 302 103 12 264 231 33 146 118 13 206 218 -12 242 -36 MAD 15.75 71.25 MSE 636.75 6653 MAPE 8.22% 24.87% TABLE 7 Error Measures

LAV Ratio Model versus LAV Model

Month Actual Total

Demand LAV Ratio Forecast Error Terms LAV Ratio Model

LAV Forecast Error Terms LAV Model 10 235 210 25 201 34 11 405 326 79 405 100 12 264 166 98 97 167 13 206 154 52 253 -47 MAD 63.5 87 MSE 4793.5 10313 MAPE 18.18% 31.3%

(9)

Stage 4: Calculation of the Ratio Based Regression Forecasts

Both OLS regression and LAV regression were used to generate ratio based forecasts for the 4 month forecast horizon. This section will illustrate the OLS ratio based approach [10] by applying the OLS regression models found in Table 4 to the forecast problem. A similar procedure was used to implement the LAV ratio approach.

For each month in the planning horizon, F(t), the forecast of total demand D(t) was found by first estimating the ratio R(t) with FR(t) and then multiplying either the actual D(t-1) value or its forecasted value, F(t-1), by FR(t). For example, Table 1 shows that A(10,1) = .8 and Table 4 shows that the regression model corresponding to h > 1 is FR(t) = .1884 + .8163[A(t,h)]. According to the historical data for months 1-9, the actual total demand for month 9 was 262. Thus,

FR(10) = .1884 + .8163(.8) = .8414, and

F(10) = D(9)[FR(10)] = 262(.8414) = 220.44 ≈ 220.

Similarly, Table 1 shows that A(11,2) = 1.74 and Table 4 shows that the regression model corresponding to h > 2 is FR(t) = .1466 + .8885[A(t,h)]. There is no actual value for D(10) but F(10) = 220. Thus,

FR(11) = .1466 + .8885(1.74) = 1.693, and

F(11) = F(10)[FR(11)] = 220(1.693) = 372.46 ≈ 372.

The forecasts for months 12 and 13 can be found in a similar manner and are listed in Table 6. As noted earlier, the LAV ratio based procedure is very similar to the OLS approach. The LAV approach used the LAV ratio regression models listed in Table 5 to predict the R(t) values. The LAV ratio based forecasts for months 10-13 are shown in Table 7.

Step 5: Model Comparison

The mean absolute deviation (MAD), mean square error (MSE) and mean absolute percent deviation (MAPE) for each model were computed for the 4 month planning horizon. These error measures are presented in Table 6 and Table 7. Table 6 shows that the OLS Ratio model outperformed the standard OLS approach on all three error measures. Table 7 reveals that the LAV ratio method outperformed the non-ratio LAV model on all error measures. In addition, the OLS ratio method had the greatest forecast accuracy for the months 10-13 on all three error measures.

DISCUSSION

The results from this study suggest that a ratio-based regression approach to forecasting with partial demand data may prove more effective in practice than the conventional regression models that use actual demand levels. The LAV ratio method outperformed the non-ratio LAV method while the OLS ratio approach was more accurate than the non-ratio OLS method. The

(10)

use of demand ratios – instead of actual demand values – resulted in a smoothing effect on the data, which enhanced overall accuracy over the forecast horizon.

The purpose of this study was to compare the relative accuracy of 4 types of regression models; however, there are other models that could be used for comparison. In particular, the smoothing models described by Bodily et al. (1988) could be included in future analysis. These models are not as complex as the Bayesian models mentioned earlier so it would be interesting to compare their accuracy with that of the regression models. It would also be interesting to test the regression models examined in this paper in other research contexts, especially if larger data sets were available.

REFERENCES

[1] Benton, W.K., Forecasting for management. Wesley, MA: Addison-Wesley, 1972.

[2] Bestwick, P. A forecast monitoring and revision system for top management. Operational Research Quarterly. 1975, 26, 419-429.

[3] Bodily, S. and Freeland, J. A simulation of techniques for forecasting shipments using firm orders-to-date. Journal of the Operational Research Society. 1988, 39, 833-846.

[4] Chang, S. and Fyffe, D. Estimation of forecast errors for seasonal style goods. Management Science. 1971, 18, l89-96.

[5] Dielman, T, A comparison of forecasts of least absolute value and least squares regression. Journal of Forecasting. 1986, 5, 189-185.

[6] Fildes, R. and Stevens, C. Look- no data: Bayesian forecasting and the effects of prior knowledge. In Forecasting and Planning. (R.Fildes and D. Woods, Eds.) New York: Prager, 1978.

[7] Green, M. and Harrison, P. Fashion forecasting for a mail order company using a Bayesian approach. Operational Research Quarterly. 1973, 24, 193-205.

[8] Guerrero, V. and Elizondo, J. Forecasting a cumulative variable using its partially accumulated data. Management Science. 1997, 43(6), 879-889.

[9] Kekre, S., Morton, T. and Smunt, T. Forecasting using partially known demands. International Journal of Forecasting. 1990, 6, 115-125.

[10] May, G. and Sulek, J., The use of advance order data in demand forecasting. Proceedings of the DSI National Conference. 2007.

[11] Murray, G. and Silver, E. A Bayesian analysis of the style goods problem. Management Science. 1966, 12(11), 785-797.

References

Related documents

Box II: Usage Growth, harvest and preparation of algae Feed production Product cleaning and packaging Box I:Production Production of Magnesium sulfate Transport Production of Sodium

Mater staff are dedicated to providing highest quality health care services, through a sincere commitment to Mater’s core values of Mercy, Dignity, Care, Commitment and

For regular bipartite multigraphs with edge multiplicities at most d/2, Algorithm 2 still takes time at most O(n log n) with high probability, since SAMPLE-OUT-EDGE can be

Speci fically, response latency data were used to determine if college students (N ¼30; 15 male and 15 female) more rapidly af firmed that attractive versus unattractive facial

The terminology service is supplied with the value domain representing the data entry field, the coding scheme to be used in the field, and the language and usage context in which

Would need to be accompanied by strategic instruments to reduce journey lengths and improve facilities and infrastructure for other modes. Note:- 1) These instruments would

The package paraserv (nickname ps ) provides smespr/1.0 compliant server functionality to the allegro common lisp system (using the defsystem package developed by Mark Kantrowitz

The business process selected for internal fraud risk reduction is procurement, so data from the case company’s procurement... cycle is the input of