Marketing Science Diffusion Model Issues - Forecasting the decline of superseded technologies :

5.3.1. The diffusion model forecasting process

The process of running a classic diffusion model forecast is straightforward and is well described in the literature (Armstrong, 1985, 2001d; Mahajan & Peterson, 1985). First, a mathematical functional form is selected to suit the expected shape of the curve. Choosing the parameters of that functional form that best estimate the expected curve is the next step. This becomes the parameterised model, which is then run across future time periods to produce a series of point estimates of the variable of interest.

Intuition might suggest that forecasters try a series of predictions, based on a range of values of a model’s parameters. Unfortunately, spreading of parameter values across a range, frequently gives vastly different shaped curves with only slightly different parameter values (Steffens & Murthy, 1992), and is unsuccessful in providing realistic estimations of the variable of interest, even in the case of the Bass model where banks of parameters for many different situations exist (Massiani & Gohs, 2015). Forecasters might also compare the forecast with those from other methods, combine forecasts, or use expert intervention to bootstrap the model in search of better forecasts (Armstrong, 2001d). This type of process is suitable for any S-curve prediction task.

5.3.2. Is the model naïve or causal?

It is important to note that one can develop two types of diffusion models. Naïve models

whose structures are not supported by theory and rely on just fitting the data for the situation of interest (Armstrong, 1968). Alternatively one can develop causal models where the parameters are considered representative of real data on specific causal factors from the environment, and those parameters, in principle, can be determined from environmental data (Armstrong, 1968).

There are also two methods for calibrating diffusion models. One method relies on fitting a candidate model to historical data representing the variable of interest; this method is called

naïve modelling. The second one estimates a model based on parameters derived externally to the variable of interest (Armstrong, 1968). This method of fitting is causal modelling.

The implication is that models may be causal at conception but not causal in use, a distinction not often made in the literature. Models loaded with causal factor data are considered causal in this thesis, and all others are described as naïve (or sometimes fitted) models for the reasons outlined, no matter what the theoretical foundation.

5.3.3. The availability of data and naïve fitting

Factors such as marketing spend, competitive pressure, and pricing are often recorded differently from firm to firm and across industries. Frequently such information is not available at all. Thus, the calibration of models using these determinant data is difficult and models are typically curve fitted, where estimates of the parameters come from the observed data on the variable of interest, rather than from market data on the parameters of the model. In other words, whether the model was designed to be one of internal or external influence, was conceived as causal, or was chosen as a purely mathematical form with no basis in theory, suitable parameter values to fit the model to data are derived from fitting the model to the currently available data on the variable of interest.

This process of curve fitting, to which forecasters are frequently driven, is an iterative fitting (naïve) activity, and is so easy with modern software, that it can, ironically, lead forecasters to include more parameters in an attempt to further improve fit. More parameters absorb more degrees of freedom in the data and can result in a model becoming over-fitted if there are too many model parameters relative to the number of data points. This situation is critical in situations where data series are short or sparsely sample a phenomenon or both.

In improving the fit there are some limits to the gains from adding parameters, as Parker (1993) demonstrated there is little value in going beyond four parameters (including the ceiling parameter), notwithstanding any difficulties in explaining the model’s theoretical base. This provides an upper limit that a forecaster should not exceed, in the interests of parsimony.

5.3.4. Principles that apply in diffusion model forecasting

There are few universal rules in applying diffusion models; however, there are three critical principles. The primary principle is the identification and/or acceptance of an S-curve pattern of diffusion to allow the deployment of marketing science diffusion models. Meade (1984) argues that this aspect of context validity is a critical principle in diffusion forecasting. The second principle is the single purchase proposition, where the technology is purchased only once, at adoption. This is the requirement for most diffusion models to have face validity: Meade’s face validity test is that the curve should have an obvious ceiling (Meade,

1984). The third principle is not to use the model to estimate the ceiling “L” (market

potential), as well as the “a” and “b” (shape and rate parameters) as notated in a typical three parameter model such as Equation (10) repeated below.

𝑌(𝑡)=

𝐿 1 + 𝑒−(𝑎+𝑏𝑡)

Debecker and Modis (1994) also support this view, that the ceiling must be estimated separate to the model, as does M. R. Young (1993) who demonstrated that the two most popular S-curve functional forms, the Pearl logistic and the Gompertz, will give poor forecast accuracy when used to forecast without knowing the upper limit. Additionally, Modis (2007) notes that fitting programmes generally yield fits that are biased towards low ceilings. Modis also described a heightening of this bias through permitting larger margins for the determination of the model parameters. Finally, some error measures can influence the predicted ceiling negatively (Miyazaki (1994) cited in Tofallis, 2014), as discussed further in this chapter’s section on error measures (section 6.3). In decline forecasting as proposed in this thesis, the peak is determined as unity and fitting to the data will produce a model which will predict a floor. That floor would seem to be unknown to managers as it is some residual, which will be unclear until it becomes identified. This is different from the concept of rising to saturation in the diffusion case, where total market size might well be understood ahead of time.

5.3.5. Selecting between diffusion models

Beyond the provision of some generic guidelines, (see Armstrong & Green, 2011; Meade & Islam, 1998), guidance for the selection of diffusion model functional forms is extremely limited; a well-recognised but ongoing gap in the literature (Riikonen, Smura, Kivi, & Töyli,

2013). The rarely cited U. Kumar and Kumar (1992) provides some guidance. They observe that to select effectively between the many diffusion of innovation models one needs to understand four facets of the models:

 the motivation and assumptions of the model’s designers;

 the prime analytical characteristics of each model;

 the relationships that exist between models;

 the behaviour of the model when tested (empirically on a data set)

However, these rules are hard to operationalise without extensive knowledge. U. Kumar and Kumar (1992), recognising this point, propose a framework for model selection based on what they see as important model characteristics:

 The number of parameters and their ranges;

 The point of inflection’s location;

 The observation of symmetry or non-symmetry around the point of inflection. The criteria have their own problems. Criteria One requires great skill and experience. The second criterion requires the passing of between 40 and 60 percent of the diffusion for the inflection point to be recorded before an empirically sound guess as to the likely pattern can be made prior. The third criterion requires that diffusion is close to completion or that an empirically sound guess can be made as to the symmetry. Despite these limitations, the framework is useful in directing the forecaster’s focus onto the issues of suitability. Suitability can be assessed with the expected shape of the data series determined from analogous data. Both the inflection point location and the symmetry/asymmetry about that point are important in this regard. As a principle, functional forms with a similar shape to the data should be chosen to be the foundation of a model.

Sharif and Kabir (1976) observed that unless the diffusion and substitution process had nearly been completed, some models overestimate the level of diffusion, for example the Fisher and Pry (1971) and Blackman (1971a) models, while others give an underestimated growth forecast, such as the Floyd (1968) flexible model. Implicit in these biases is the observation that most simple models need at least 20 percent of the S-curve to be recorded, before an acceptable forecast of the growth rate can be achieved. For example Heeler and Hustad (1980) and Srinivasan and Mason (1986a) suggest 10 years’ worth of data for

where there are no discontinuities in the curve. With very little literature to guide them and given the challenges in selecting models, forecasters generally use the most popular model (Armstrong, 2001e).

5.3.6. Model parameterisation and over-fitting in models

It is always necessary to have more observations than parameters. In theory, only one more point than the number of parameters is required. From statistical theory, any less than this and the parameter’s estimated values will have infinite standard error, and the resultant prediction interval is infinitely wide (Hyndman & Kostenko, 2007). When data contain substantial random variation from trend, more data is needed to estimate stable parameters for a model. Conversely, when the variation is small, it is possible to estimate with fewer data points. Given that, if the data from Emmanouilides (2006) and from Parker (1993) are typical, that is, diffusion series are typically short but follow a smooth highly correlated progression (indicating low random variation relative to the trend magnitude) then it should be possible to estimate parameters for models, provided degree of freedom requirements are met.

From a statistical validity perspective, there seems to be no rule of thumb on model parsimony and data sufficiency (the degrees of freedom problem), although regression texts quote ten observations per parameter in a model (Harrell, 2015). However, given the Emmanouilides (2006) data then it seems diffusion forecasters are often working with less than this minimum and thus could be accused of violating many of the basic rules of model validity. Helpfully, Vittinghoff and McCulloch (2006) have demonstrated many situations where this rule can be justifiably relaxed.

As discussed in Chapter Two, the diffusion of technologies is driven by external factors, such as the degree of word of mouth communication or advertising about the technology, the social structure of the market and the availability, utility, and cost of the technology. The literature demonstrates two divergent philosophical approaches to specifying models. One approach is to develop parsimonious models with a minimum of parameters to avoid overfitting to the data. The other approach comes out of the desire of model developers to include all possible determining factors into a comprehensive model to describe all the available historic data (Neal, 2012, pp. 103-104). There are enticing arguments for both

parsimonious and comprehensive model approaches, some of which are discussed in the following sections. A comprehensively specified model does well at describing the ins and outs of data points in the estimation data, because it becomes a very good fit to the random (stochastic) components of the historical data. The concern is that this fitting to the stochastic component is at the expense of fitting to any important and useful underlying trend in the data generation process. This state of fitting to the noise in the data is commonly called over-fitting (Babyak, 2004). However, as mentioned earlier overfitting risks becoming the norm if data points are sparse in the estimation period. The low density of diffusion data limits the practical permitted complexity of models. So, the theoretical ideal (to use a comprehensive model to fully describe the phenomena) and the best practice (to ensure the methods suit the data available) need to be traded off (Babyak, 2004).

In document Forecasting the decline of superseded technologies : a comparison of alternative methods to forecast the decline phase of technologies : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Marketing at Ma (Page 88-93)