• No results found

Analysis of algorithms of time series analysis for forecasting sales

N/A
N/A
Protected

Academic year: 2022

Share "Analysis of algorithms of time series analysis for forecasting sales"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

SAINT-PETERSBURG STATE UNIVERSITY Mathematics & Mechanics Faculty

Chair of Analytical Information Systems

Garipov Emil

Analysis of algorithms of time series analysis for forecasting sales

Course Work

Scientific supervisor:

docent Natalia Grafeeva

(2)

Contents

1. Introduction 3

1.1. Motivation . . . 3

1.2. Problem statement . . . 3

1.3. About the Dataset . . . 3

1.4. Software tools . . . 3

2. Description 4 2.1. Time Series . . . 4

2.2. Time Series Decomposition . . . 4

2.2.1. Time Series patterns . . . 4

2.2.2. Time Series Decomposition Models . . . 5

2.2.3. Forecasting with decomposition . . . 5

2.3. Linear regression . . . 6

2.4. ARIMA models . . . 6

2.4.1. Stationarity . . . 6

2.4.2. Autoregressive models . . . 7

2.4.3. Moving-average models . . . 7

2.4.4. ARMA models . . . 7

2.4.5. Non-seasonal ARIMA model . . . 8

2.4.6. Variations and extensions . . . 8

3. Summary 9

4. Future plans 10

(3)

1 Introduction

1.1 Motivation

Forecasting is required in many situations: deciding whether to build another power generation plant in the next five years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; planning expenses for company that sells some products requires forecasts of sales in future. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Forecasting is an important aid to effective and efficient planning.

1.2 Problem statement

The purpose of this paper is to study principles of time series analysis. And also to review some of existing approaches and algorithms for forecasting time series that could be helpful in solving in predicting sales.

1.3 About the Dataset

Dataset consists of the information of daily sales of a certain product in the restaurant chain of Moscow for five years. It has the data about every restaurant in the chain so we can get results of forecasting for each one in particular.

1.4 Software tools

All calculations were performed using a programming language R and the library ”fore- cast”

(4)

2 Description

2.1 Time Series

A time series is a sequence of data points, typically consisting of successive measure- ments made over a time interval. Time series forecasting is the use of a model to predict future values based on previously observed values.

What we need is to find this model that will fit our data. But first, preliminary and exploratory analysis is required.

It is necessary to take into consideration some characteristics of our data like:

• Are there consistent patterns?

• Is there a significant trend?

• Is seasonality important?

• Is there evidence of the presence of business cycles?

• Are there any outliers in the data that need to be explained?

• How strong are the relationships among the variables available for analysis?

There are several approaches to model time series data.

2.2 Time Series Decomposition

2.2.1 Time Series patterns

There are three types of time series patterns:

Trend

A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend “changing direction” when it might go from an increasing trend to a decreasing trend.

Seasonal

A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quar- ter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.

Cyclic

A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.

The duration of these fluctuations is usually of at least 2 years.

(5)

2.2.2 Time Series Decomposition Models

Let the time series yt is comprising three components: a seasonal component, a trend- cycle component (containing both trend and cycle), and a remainder component (contain- ing anything else in the time series). And we can assume an additive or a multiplicative model.

The additive model would be:

yt= St+ Tt+ Et, (1)

The multiplicative model would be:

yt = St× Tt× Et, (2)

where yt is the data at period t, St is the seasonal component at period t, Tt is the trend- cycle component at period t Etis the remainder (or irregular or error) component at period t.

Note that we can transform a multiplicative model to an additive by applying log to our data:

yt= St× Tt× Et is equivalent to log yt=log St+log Tt+log Et.

There are several methods for obtaining the components St, Tt and Et. For example, STL decomposition (A Seasonal-Trend Decomposition Procedure Based on Loess) can be used for this purpose.

2.2.3 Forecasting with decomposition

After decomposing a time series we can separately forecast the seasonal component, St, and the seasonally adjusted component At = Tt+ Et To forecast the seasonally adjusted component At , any non-seasonal forecasting method may be used. For example, a random walk with drift model, or Holt’s method, or a non-seasonal ARIMA model, may be used.

[2]

To estimate the seasonal component a seasonal naïve method can be used. The method is just simply to take the last year of the estimated component as the seasonal component.

Now it is possible to forecast a tome series. Assuming an additive decomposition, the decomposed time series can be written as

yt= ˆSt+ ˆAt, where ˆSt and ˆAt are forecasted components.

(6)

2.3 Linear regression

Linear regression is an approach for modeling the relationship between a scalar de- pendent variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.[5]

The general form of a simple regression is:

yi = β0+ β1xi+ εi. The general form of a multiple regression is

yi = β0+ β1x1,i+ β2x2,i+· · · + βkxk,i+ εi,

where yi is the variable to be forecast and x1,i,…,xk,i are the k predictor variables. Each of the predictor variables must be numerical. The coefficients β0,…, βk measure the effect of each predictor after taking account of the effect of all other predictors in the model.

It could be useful especially in the context of time series analysis to take into con- sideraton dummy variables (also known as an indicator variable, design variable, Boolean indicator, categorical variable, binary variable, or qualitative variable).

That is one that takes the value 0 or 1 to indicate the absence or presence of some cat- egorical effect that may be expected to shift the outcome. If there are more than two categories, then the variable can be coded using several dummy variables (one fewer than the total number of categories).

For example, suppose we are forecasting daily sales and we want to account for the day of the week as a predictor. Then we could use six dummy variables to code seven categories (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)

Dummy variables may also be used to account holidays and to remove effect of outliers.

2.4 ARIMA models

ARIMA (Autoregressive integrated moving average) model provides another approach to time series forecasting. It is a generalization of an autoregressive moving average (ARMA) model[3] which in turn is a combination of the auto-regressive and the m ov- ing average model

2.4.1 Stationarity

A common assumption in many time series techniques is that the data are stationary. A stationary process has the property that the mean, variance and autocorrelation structure do not change over time. In other words they are flat looking series, without trend, constant variance over time, a constant autocorrelation structure over time and no periodic

(7)

fluctuations (seasonality). A stationarized series is relatively easy to predict: you simply predict that its statistical properties will be the same in the future as they have been in the past. If time series is not stationary we can transform it to stationary using differencing.

[1]

2.4.2 Autoregressive models

In statistics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it describes certain time-varying processes in nature, economics, etc. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself.[4]

The notation AR(p) indicates an autoregressive model of order p. The AR(p) model is defined as

Xt = c +

p

i=1

φiXt−i+ εi

where φ1,…, φp are parameters, c is a constant, and the random variable εt is white noise.

2.4.3 Moving-average models

In time series analysis, the moving-average (MA) model is a common approach for modeling univariate time series models. The notation MA(q) refers to the moving average model of order q [6]:

Xt= µ + εt+

q

i=1

θiεt−i

where μ is the mean of the series, the θ1, ..., θq are the parameters of the model and the εt, εt−1,...,εt−q are white noise error terms. The value of q is called the order of the MA model.

2.4.4 ARMA models

The notation ARMA(p, q) refers to the model with p autoregressive terms and q moving-average terms. This model contains the AR(p) and MA(q) models,

Xt= c + εt+

p

i=1

φiXt−i+

q

i=1 iεt−i

(8)

2.4.5 Non-seasonal ARIMA model

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA(p,d,q) model, where

p = order of the autoregressive part;

d = degree of first differencing involved;

q = order of the moving average part.

The full model can be written as

yt = c + ϕ1yt−1+· · · + ϕpyt−p+ θ1et−1+· · · + θqet−q+ et,

where yt is the differenced series and it may have been differenced more than once.

2.4.6 Variations and extensions

A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the Xt can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model. A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models.

(9)

3 Summary

Some common approaches in forecasting behavior of the time series were studied.

(10)

4 Future plans

Get a deeper understanding of concepts of a time series analysis. Study more sophis- ticated methods for forecasting. For example, artificial neural networks. Apply different methods to the dataset to get practical results.

(11)

References

[1] NIST/SEMATECH. NIST/SEMATECH e-Handbook of Statistical Methods //

http://www.itl.nist.gov/. –– 2012. –– http://www.itl.nist.gov/div898/handbook/.

[2] Rob J Hyndman George Athanasopoulos. Forecasting: principles and practice //

https://www.otexts.org. –– 2014. –– https://www.otexts.org/fpp.

[3] Wikipedia. Autoregressive integrated moving average // Wikipedia, The free encyclo- pedia.

[4] Wikipedia. Autoregressive model // Wikipedia, The free encyclopedia.

[5] Wikipedia. Linear regression // Wikipedia, The free encyclopedia.

[6] Wikipedia. Moving-average model // Wikipedia, The free encyclopedia.

References

Related documents

However, the most fundamental financial issue for a small business owner is; “How do I make enough money so that I, my family can live their best possible lives doing what they

Among the agricultural sections, the greenhouse products section, as one of the biggest purveyor recourses of society food and the place for engagement of noticeable group

Moving on to Case 2 in Table 7.2, there are four products in Table 7.1 with a low degree of voluntary origin labelling as a result of low benefits and potentially high producer

Melting temperature of monolithic SAC solder alloy vary with the addition of different nanoparticles (Al 2 O 3 , C and Fe) as shown in Figure 6 from DSC analysis.. This confirms

Interest gain Date due Interest loss Post-paid Pre-paid t Source: DB Research, 2004 • Credit card • Pre-paid e-money units on smart card or as network money • Debit card •

Pricing does not include transition services to a subsequent service provider. ARRA funds will not be utilized to fund this

AB 690 (Campos) Jobs and Infrastructure Financing Districts -Currently, the legislative body of a city or county is authorized to create an infrastructure financing district, adopt

These systems remove silver from solutions utilizing chemical recovery cartridges (CRCs) and produce a non-hazardous liquid that can be discharged to a municipal sewer system.