• No results found

Demand Model Estimation and Calibration

4.5 Calibrating the Models from Real Data

4.5.2 Demand Model Estimation and Calibration

We now discuss some aspects related to the estimation of the models using our specific data-set. We begin by focusing on the linear demand model from Section 4.2.1. Recall that the functional dependency introduced there was given by (4.1a), (4.1b), which we paste below, for convenience:

Dt(pt, εt) = bt+ Atpt+ εt.

While the model is certainly a simplification of reality, since it ignores several salient features (such as the effect of inventory on sales Smith and Achabal [135], the effect of promotions and coupons Woo et al. [147], Boyd et al. [44], the strategic customer behavior Talluri and van Ryzin [138], etc.), it remains very popular in the academic literature, and also in practice. One of the main attractive features of the model is

the ease of estimation from data - more precisely, with unconstrained demand as the dependent variable, and price as an independent variable, one could utilize regression techniques to estimate the sensitivity matrices At and the market-size factors bt.

In practice, however, several issues can arise. Firstly, it is easy to see that the number of parameters to be estimated can quickly become very large, since it is proportional with both the number of items and the horizon. In particular, in case only a few selling seasons are available (in our data-set, we only have one!), estimating independent bit for each item is practically infeasible. Therefore, what is often done

in practice is to aggregate data from multiple items together, and/or to ignore some of the time dependencies. For instance, a popular choice (Talluri and van Ryzin [138], Ramakrishnan [124]) is to assume that the items in different organizational units are independent, that the price sensitivity matrix is time-invariant, i.e., At= A, ∀ t ∈ T ,

and that the bt component can be separated into a base demand b ∈ Rn, which is

time-invariant, and a seasonal factor st ∈ Rn, often assumed to be the same for all

items in a particular organizational group. For instance, if all the items i ∈ S were taken to have the same seasonality, and be independent of items in I \ S, then the functional equation for the demand of items in S would become

Dt(pt, εt) = b + A pt+ 1 st+ εt, ∀ t ∈ T , (4.14)

where A ∈R|S|×|S|, b ∈R|S|, and s

t ∈R would represent an additive seasonal factor

corresponding to period t. The aggregation of the items can be performed either by using sensible business rules Ramakrishnan [124], Talluri and van Ryzin [138], or by using other statistical techniques, such as clustering, classification and regression trees or time-series analysis (see, e.g., Kumar and Patel [91], Ghysels et al. [72] or the books Greene [77] and Box et al. [43]).

Due to these considerations, we decided to also make the following simplifications in our model:

1. We assume that SKUs in different subclasses are independent.

factor st, but different market sizes, bi.

3. We assume that the demand-sensitivity matrices are time-invariant, i.e., At=

A, ∀ t ∈ T .

4. We assume that each item’s demand only depends on its own price and the

average price of the other items inside the same subclass. Furthermore, we assume that the effects are the same across all the SKUs in a particular subclass.

More precisely, we take:

Dit= bi+ a pit+ a−

X

j∈S\{i}

pjt+ st+ εit, (4.15)

where a represents the effect of SKU i’s own price, while a− denotes the effect

from the prices of all the other items j inside the same subclass S.

These assumptions are made more out of necessity (i.e., to enable an adequate estimation), rather than out of solid economic or business considerations. In reality, even items inside the same subclass can be quite “different” in terms of seasonality patterns, and one can expect both substitutability, as well as complementarity effects to exist across subclasses5. Such effects could be captured with a significantly larger

data-set, consisting of several selling seasons involving the same items, but were outside the scope of our data.

The second remark we would like to make is that some of the requirements in our model description (most importantly, Assumptions 4 and 5) might not hold if the parameters are estimated by running an OLS regression. One immediate correction for this would be to run a constrained regression, in which the parameters are forced, via inequality constraints, to obey the properties mentioned in our discussion in Sec- tion 4.4.2. This approach does not present any computational difficulties (one would have to solve a constrained quadratic program), but has the main pitfall of invali- dating most of the standard statistical analysis in linear regression (e.g., inferences

5For an example of the former, imagine an item in fashion outerwear is discounted, hence one

prefers to buy that rather than a functional outerwear item; for the latter, suppose a shirt is dis- counted, inducing the purchase of a matching pant, from a different subclass

based on t- or F-statistics are no longer possible under inequality constrained linear regression Geweke [71], so one must resort to other techniques, such as bootstrapping, for testing statistical significance). Our regression results, presented in Section 4.5.4, frequently encountered this problem, thus requiring a pragmatic choice that traded off between (a) the convenient theoretical properties of OLS regression and (b) the consistency of the model parameters with standard microeconomic theory.

Our third (and final) remark is related to the fact that our data-set contained sales, rather than direct demand information. The distinction becomes relevant when one might be dealing with a censoring effect, whereby, once on-hand inventory becomes 0, one observes a truncated demand function. There are standard tools in regres- sion modelling for dealing with such situations (e.g., tobit regression Greene [77], the expectation-maximization algorithm, Gibbs sampling or the Kaplan-Meier estima- tor Talluri and van Ryzin [138]). However, in our data-set, the vast majority of SKUs still had remaining inventory after the end of the sales period, thus the number of records that could have suffered from censoring effects was very small. Therefore, we decided to ignore this issue in our regression estimation procedures.