• No results found

Parallel Processing for Faster ARIMA Model Selection

N/A
N/A
Protected

Academic year: 2021

Share "Parallel Processing for Faster ARIMA Model Selection"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Parallel Processing for Faster ARIMA Model Selection

Parallel Processing for Faster ARIMA Model

Selection

Drew Schmidt

April 10, 2012

(2)

Parallel Processing for Faster ARIMA Model Selection

Contents

1

What is Parallelization?

2

Why Should We Care About It?

3

Parallel ARIMA Model Selection

(3)

Parallel Processing for Faster ARIMA Model Selection What is Parallelization?

What is Parallelization?

(4)

Parallel Processing for Faster ARIMA Model Selection What is Parallelization?

What is Parallelization?

Parallel Processing

Parallel processing (as opposed to serial processing) is the use of

multiple processors and/or multiple processing cores (and/or

hyperthreading) in computation.

(5)

Parallel Processing for Faster ARIMA Model Selection What is Parallelization?

What is Parallelization?

Serial Processing

(6)

Parallel Processing for Faster ARIMA Model Selection What is Parallelization?

What is Parallelization?

Parallel Processing

(7)

Parallel Processing for Faster ARIMA Model Selection What is Parallelization?

An Example

Serial Processing Example

n = big number

for i = 1, 2, 3, 4, . . . , n

do something

done

(8)

Parallel Processing for Faster ARIMA Model Selection What is Parallelization?

An Example

Parallel Processing Example - 2 Cores

n = big even number

for i = 1, 3, 5, 7, . . . , n − 1

for i = 2, 4, 6, 8, . . . , n

do something

do something

done

done

combine

(9)

Parallel Processing for Faster ARIMA Model Selection Why Should We Care About It?

Why Should We Care About It?

(10)

Parallel Processing for Faster ARIMA Model Selection Why Should We Care About It?

Why Should We Care About It?

Why Should We Care About Parallelization?

Processors haven’t really gotten “faster” in the last 10 years

Parallel is faster than serial

Other people care about parallelization (i.e., looks good on a

esum´

e)

Usually easier than other ways of improving performance

(11)

Parallel Processing for Faster ARIMA Model Selection Why Should We Care About It?

Why Should We Care About It?

Speedup

Parallel processing is generally faster. Ideally, we would want

(multi core run time) = (number cores) ∗ (single core run time)

In practice, this is rarely the case because of overhead inherent to

(most) implementations.

(12)

Parallel Processing for Faster ARIMA Model Selection Why Should We Care About It?

Why Should We Care About It?

A Speedup Analogy

one core → many cores

one checkout lane → many checkout lanes

(13)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

Parallel ARIMA Model Selection

(14)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

A Counting Problem

ARIMA Model Parameter Combinations

Suppose you want to fit a seasonal ARIMA model to monthly data

ARIMA(p, d , q)

1

(P, D, Q)

12

Choices for parameters:

Generally any of p, q, P, Q will range from 0 to 10 (rarely a

sensible need to go above 5).

Both of d and D should generally be 0 or 1; large degrees of

differencing can inject structure which isn’t actually there.

(15)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

A Counting Problem

ARIMA Model Parameter Combinations

This gives us a general guideline of checking a number of models

on the order of 5

4

2

2

= 5184 (or debatably 10

4

2

2

= 58564).

This can not be done by hand.

And this is with just one form of seasonality! Consider data with

monthly, weekly, and daily seasonality:

ARIMA(p, d , q)

1

(P

1

, D

1

, Q

1

)

12

(P

2

, D

2

, Q

2

)

52

(P

3

, D

3

, Q

3

)

365

Our loose guidelines put the order of models to check at

26,873,856 for just one time series!

(16)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

A Counting Problem

Checking Several Models At Once

With so many models to evaluate and since any two such

evaluations are independent, we can spread the workload across

several cores.

(17)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

An Example

PRIVATE SELL OF SPECIAL TRANSPORT CARS IN RFA

1

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

1975

5237 3612 4385 6479 13286 9928 6755 5943

1976

5824 5433 6868 7853 5112 4082 5483 8502 11436 10313 6437 6751

1977

6976 5592 6975 6659 4688 4929 6019 7140 11166 9140 7288 7373

1978

5962 5861 7378 8632 6212 5906 5290 8238 12111 7127 6643 5998

1979

5214 5698 6675 4775 4721 4404 5635 9111 6150 4930 4994 3727

1980

4421 6296 7215 5844 8021 4090 3253 5431 4905 4671 4843 4980

1981

5201 6826 8696 5730 4949 5109 5386 9204 7656 5906 5851 5469

1982

5230 7409 7869 4989 6708 4389 4926 7518 7549 4405 5923 5593

1983

5435 6821 8376 6154 5412 4984 5999 8686 6184 4753 5038 4258

1984

4265 6484 6316 4344 5302 3566 4549 7848 4346 4553 4610 3890

1985

4675 5655 6484 4405 4022 4719 5203 8333 5740 5877 4762 4964

1986

4794 4473 5212 3518

1

Series 394 from the Mcomp package for R

(18)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

An Example

PRIVATE SELL OF SPECIAL TRANSPORT CARS IN RFA

Year 1976 1978 1980 1982 1984 1986 4000 6000 8000 10000 12000

(19)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

An Example

Model Selection

We fit all seasonal ARIMA models with d = D = 0 and each of

p, q, P, Q ranging from 0 to 6 for a total of 2401 models. The

winning model is the one with smallest Bayesian Information

Criterion (BIC) score, computed as

−2loglik + k ln(n)

On a core i5 with 4 cores (no hyperthreading)

Cores

Processes

Run Time

Avg # Models/Second

1

1

44.181 seconds

217.3785

4

4

17.126 seconds

560.7848

4

8

14.566 seconds

659.3437

(20)

Parallel Processing for Faster ARIMA Model Selection Parallel ARIMA Model Selection

An Example

18 Month Forecast from ARIMA(1, 1, 1)

1

(1, 0, 1)

12

— R

Test

2

≈ .800

1976 1978 1980 1982 1984 1986 1988 2000 4000 6000 8000 10000 12000

References

Related documents

Oppgaven er delt inn i seks kapitler med overskrifter fra treffende swam-titler. Jeg prøver også å forklare hvorfor tekstene hadde sitt innhold og sin form. Låtene deles inn i

Voltinism may change (Altermatt 2010a). Underestimates severity of mismatch in case of mid-season or late-season deficits in floral resources or pollinator

blockbuster October Advocacy Seminar Masters of Advocacy Superb advocacy skills taught by some of Virginia’s best trial lawyers.. Morning program - 3

Darshan records independent statistics for each file accessed by the application, including the number of bytes moved, cumulative time spent in I/O operations such as read()

Gambar 3 menunjukkan atribut yang paling penting untuk segera diperbaiki adalah atribut jumlah komponen yang memiliki tingkat kepentingan dengan nilai 5, derajat

Secondary analyses included similar GEE modeling with repeated measures of total DAI-10 score over time and quality of life over time (total EQ-5D score derived from.. five domains

• It provides the highest perceived level of confidentiality, since no one in the client organization touches the survey (assuming it is mailed directly to an outside vendor for

(14) Our investigation also showed that patients with sympto- matic subacromial impingement syndrome achieved significant improvement in shoulder pain and range of motion