3/5 Census Bureau methods - Makridakis, Wheelwright & Hyndman

The Census II method has been developed by the U.S. Bureau of Census II

the Census. Julius Shiskin is considered the main contributor in the development of the early stages of the method. Census II has been used widely by the Bureau, other government agencies in the United States and elsewhere, and by an ever increasing number of businesses.

Census II has gone through several variations and refinements since 1955 when the first version was developed. The most widely used variants have been X-11 (Shiskin, Young, and Musgrave, 1967) and X-11

X-11-ARIMA developed by Statistics Canada (Dagum, 1988). The X-11-ARIMA

most recent variant is X-12-ARIMA (Findley et al., 1997) which is X-12-ARIMA

an extension of the X-11-ARIMA methodology. The underlying time series decomposition methodology has remained the same throughout this development, although the refinements allow a larger range of economic time series to be adequately seasonally adjusted. In this section, we describe the X-12-ARIMA variant of Census II.

Many of the steps in Census II decomposition involve the appli-cation of weighted moving averages to the data. Therefore, there is inevitable loss of data at the beginning and end of the series because of the averaging. Usually, the X-12-ARIMA method would use shorter weighted moving averages (called end-filters) to provide estimates for the observations at the beginning and end of the series. But X-12-ARIMA also provides the facility to extend the original series with forecasts to ensure that more of the observations are adjusted using the full weighted moving averages. (The initial values can also be forecast backward in time.) These forecasts are obtained using an ARIMA time series model (Section 7/8/4) or a regression model with ARIMA errors (Section 8/1).

Census II decomposition is usually multiplicative because most economic time series have seasonal variation which increases with the level of the series.

3/5 Census Bureau methods 115 3/5/1 First iteration

As with all decomposition methods, Census II is aimed at making a separation of the seasonality from the trend-cycle and then isolating the randomness. The algorithm begins in a similar way to classical decomposition, and then proceeds through several iterations in which the estimated components are refined. The steps in each iteration are outlined below. We use the monthly airline passenger data of Table 3-5 to show the results of the first iteration.

Step 1 A 12-month centered moving average is applied to the orig-inal data giving a rough estimate of the trend-cycle. This is exactly the same as Step 1 of classical decomposition shown in Table 3-6. There are six values missing at the beginning because of the averaging procedure used. However, there are not six values missing at the end because the unobserved data for these months were forecast using an ARIMA model.

Step 2 The ratios of the original data to these MA values are calculated as in Step 2 of classical multiplicative decomposition.

Step 3 The ratios in the lower part of Table 3-6 include such random or unusual events as strikes and wars. The next task in Census

II is to exclude such extreme values before finding estimates of extreme values the seasonal component.

A separate 3 × 3 MA is applied to each month of the centered ratios of Table 3-6. The resulting values form a new series which is a rough estimate of the seasonal component. Now the centered ratios of Table 3-6 contain both the seasonal and irregular component. So dividing these by the estimated seasonal component, we obtain an estimate of the irregular component. Mathematically,

S_tE_t S_t = E_t.

Large values of E_tindicate an extreme value in the original data.

These extreme values are identified and the centered ratios of Table 3-6 are adjusted accordingly. This effectively eliminates any extreme values that do not fit the pattern of the rest of the

data. The missing values at the beginning of the series are also replaced by estimates at this stage.

Step 4 The next step is to eliminate randomness by taking a 3 × 3 MA of each month of the year individually. This moving average is analogous to the one in Step 3 except that the modified data (with replaced extreme values and estimates for missing values) are used. Then the results are further adjusted to ensure they add up to approximately 1200 over any 12-month period. This calculation gives the values shown in Table 3-7.

Table 3-6 gives values equivalent to equation (3.18) and these include seasonality and randomness. Since randomness has been eliminated by replacing extreme values and smoothing through a 3 × 3 moving average, what remains in Table 3-7 is an estimate of the seasonal component.

Step 5 The original data are then divided by this preliminary sea-sonal component to obtain the preliminary seasea-sonally adjusted series. These values contain only the trend-cycle and the irregular component. They can be written mathematically as:

Y_t

S_t = S_tT_tE_t

S_t = T_tE_t. (3.19)

Step 6 The trend-cycle is then estimated by applying a weighted moving average to the preliminary seasonally adjusted values.

In X-12-ARIMA, a Henderson’s weighted average is used with the number of terms determined by the randomness in the se-ries. (The greater the randomness, the longer the length of the moving average used.) For monthly series, either a 9-, 13-, or 23-term Henderson moving average is being selected depending upon the extent of the randomness in the series. For quarterly series, either a 5- or a 7-term Henderson moving average is being selected. (In this example, a 13-term Henderson moving average was selected.)

The rationale for applying this average is that the data given by equation (3.19) include trend-cycle and randomness. This moving average eliminates the randomness, providing a smooth

3/5 Census Bureau methods 117

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1949 91.9 94.3 103.7 99.1 98.1 107.1 118.4 117.1 107.0 91.3 81.2 91.1 1950 92.0 94.0 103.5 99.0 98.1 107.4 118.1 117.5 106.6 91.4 81.5 91.2 1951 92.2 93.3 103.3 99.0 98.2 107.9 117.6 118.2 105.9 92.1 81.7 91.3 1952 92.1 92.1 102.7 98.5 98.4 109.0 118.2 119.6 105.3 92.6 81.6 90.9 1953 91.7 90.6 101.9 98.1 98.5 109.8 120.1 120.3 105.2 93.1 80.9 90.4 1954 91.3 89.3 100.8 97.6 98.4 111.0 123.1 120.7 105.5 92.5 80.2 90.0 1955 91.4 88.5 99.8 97.5 98.0 111.8 125.2 120.6 105.8 91.9 79.6 90.0 1956 91.5 88.1 99.2 97.3 97.8 112.7 126.1 120.7 105.9 91.4 79.4 90.0

Table 3-7: Preliminary seasonal component.

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1949 124.8 125.5 126.0 126.3 126.2 126.1 126.2 126.6 127.1 127.7 128.5 129.4 1950 130.3 131.3 132.4 134.0 136.2 138.6 140.9 142.9 144.5 146.2 148.4 151.9 1951 156.7 161.8 166.0 168.4 169.2 169.3 169.6 170.5 172.5 175.8 179.6 183.2 1952 185.7 187.3 188.4 189.6 191.0 193.1 196.1 199.5 202.8 205.4 208.4 212.6 1953 218.0 223.2 227.3 229.3 229.4 228.0 226.0 224.5 223.8 223.4 222.6 221.7 1954 221.7 223.4 226.8 231.3 235.8 239.5 242.2 244.1 246.3 249.1 252.5 256.5 1955 260.6 264.9 269.2 273.5 277.9 282.3 286.4 290.2 293.9 297.7 301.8 306.0 1956 310.5 314.8 318.8 322.5 325.8 328.6 331.1 333.3 335.2 337.0 339.0 341.2

Table 3-8: Preliminary trend-cycle.

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1949 90.4 93.7 105.5 99.5 96.8 106.7 118.1 118.0 106.9 92.2 81.2 91.3 1950 90.5 93.2 105.3 99.4 96.9 106.7 118.2 118.4 106.7 92.5 81.3 91.3 1951 90.7 92.3 104.9 99.0 97.3 107.0 118.8 119.0 106.3 92.6 81.4 91.1 1952 91.1 91.1 104.2 98.5 97.7 107.5 119.7 119.4 106.0 92.8 81.3 90.9 1953 91.5 89.9 103.2 98.1 97.9 108.6 120.8 120.1 105.7 92.7 81.2 90.7 1954 91.7 88.9 101.9 98.0 98.1 109.8 122.0 120.4 105.6 92.6 80.9 90.4 1955 91.7 88.4 100.9 97.9 98.1 111.1 123.1 120.5 105.6 92.1 80.6 90.1 1956 91.7 88.2 100.2 97.8 98.2 112.0 123.8 120.3 105.7 91.7 80.4 89.9

Table 3-9: Seasonal component.

curve that highlights the existence of a trend-cycle in the data.

The resulting preliminary trend-cycle is given in Table 3-8.

Step 7 Now we have a new estimate of the trend-cycle, and we can repeat Step 2. New ratios are obtained by dividing the original data by the estimated trend-cycle leaving only the seasonal and irregular components remaining. These are called the final seasonal-irregular ratios and are given mathematically by

Y_t

T_t = T_tS_tE_t

T_t = S_tE_t (3.20)

where T_t is the preliminary trend cycle estimated in Step 5.

Applying a weighted moving average would normally cause the loss of several values at the beginning of the series and several at the end. To avoid this loss, each of the missing values is replaced by an estimated value.

Step 8 This is a repeat of Step 3 but using the new ratios computed in Step 7 and applying a 3 × 5 MA instead of a 3 × 3 MA.

Step 9 This is a repeat of Step 4 but using a 3 × 5 MA instead of a 3 × 3 MA. The resulting seasonal component is shown in Table 3-9.

Step 10 The same as Step 5 but using the seasonal component obtained in Step 9.

Step 11 The irregular component is obtained by dividing the sea-sonally adjusted data from Step 10 by the trend-cycle obtained in Step 6. Mathematically, the seasonally adjusted data are given by T_tE_t. So dividing by the trend-cycle T_t gives E_t, the irregular component.

Step 12 Extreme values of the irregular component are replaced as in Step 3. Then a series of modified data is obtained by multiplying the trend-cycle, seasonal component, and adjusted irregular component together. These modified data are exactly the same as the original data, but without the extreme values.

For the airline data, 10 of the 96 values were adjusted.

3/5 Census Bureau methods 119

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Seasonal component

1949 90.7 93.9 105.9 99.3 97.0 106.6 118.1 117.7 106.4 91.8 81.4 91.3 1950 90.8 93.4 105.8 99.1 97.2 106.7 118.1 118.2 106.2 92.0 81.4 91.3 1951 91.0 92.6 105.5 98.6 97.7 107.0 118.5 118.8 106.0 92.3 81.4 91.1 1952 91.3 91.3 104.8 98.2 98.2 107.7 119.2 119.4 105.8 92.5 81.3 90.8 1953 91.6 90.1 103.6 97.8 98.4 108.8 120.3 120.1 105.6 92.6 81.1 90.5 1954 91.7 89.1 102.3 97.7 98.5 110.0 121.4 120.4 105.6 92.6 80.9 90.3 1955 91.6 88.5 101.1 97.7 98.3 111.2 122.6 120.6 105.7 92.2 80.7 90.0 1956 91.6 88.2 100.2 97.7 98.2 112.0 123.4 120.5 105.9 91.9 80.5 89.9

Trend component

1949 124.9 125.4 125.8 126.1 126.2 126.2 126.4 126.7 127.1 127.6 128.3 129.1 1950 130.3 131.8 133.7 136.0 138.3 140.5 142.4 144.1 145.9 148.0 150.9 154.5 1951 158.4 162.2 164.9 166.4 167.0 167.4 168.2 169.9 172.6 176.0 179.3 182.0 1952 183.7 184.5 185.1 186.1 187.9 190.6 194.2 198.2 202.1 205.7 209.2 212.9 1953 217.0 221.0 224.3 226.3 227.2 227.1 226.3 225.3 224.4 223.8 223.4 223.4 1954 224.0 226.0 229.2 233.1 237.0 240.2 242.6 244.4 246.3 248.6 251.6 255.3 1955 259.5 263.9 268.2 272.5 276.8 281.1 285.3 289.5 293.6 297.9 302.2 306.1 1956 309.9 313.6 317.6 322.0 326.3 330.2 333.0 334.6 335.5 336.2 337.5 339.4

Irregular component

1949 98.9 100.2 99.1 103.1 98.9 100.3 99.2 99.2 100.5 101.6 99.6 100.1 1950 97.2 102.3 99.7 100.2 92.9 99.4 101.1 99.8 102.0 97.6 92.8 99.3 1951 100.6 99.9 102.4 99.3 105.4 99.3 99.8 98.6 100.6 99.7 100.0 100.2 1952 102.0 106.8 99.5 99.0 99.2 106.3 99.4 102.3 97.8 100.3 101.1 100.3 1953 98.6 98.4 101.5 106.1 102.4 98.4 97.0 100.6 100.0 101.8 99.3 99.4 1954 99.3 93.4 100.2 99.7 100.3 99.9 102.5 99.5 99.6 99.5 99.8 99.3 1955 101.8 99.8 98.5 101.1 99.2 100.8 104.1 99.4 100.5 99.7 97.2 100.8 1956 100.1 100.1 99.6 99.5 99.3 101.1 100.5 100.4 99.9 99.0 99.7 100.3

Table 3-10: Final components for the airline series.

3/5/2 Later iterations

These 12 steps are repeated two more times, but beginning with the modified data from Step 12 rather than the original data. On the final iteration, the 3 × 5 MA of Steps 8 and 9 is replaced by either a 3 × 3, 3 × 5, or 3 × 9 moving average, depending on the variability in the data. For the airline data, a 3 × 5 MA was chosen for the final iteration also.

The components obtained after the final iteration are given in Ta-ble 3-10. Note that the seasonal component and irregular component are usually given as percentages. Multiplying the three components

together gives the original series. A decomposition plot showing each of these components is given in Figure 3-11 while Figure 3-12 shows the seasonal sub-series.

The final seasonally adjusted series is found by dividing the final seasonal component of Table 3-10 into the original data. This is equivalent to the product of the trend-cycle and irregular compo-nents.

After the basic components of the time series have been estimated, a series of diagnostic tests is used to determine whether or not the diagnostic tests

decomposition has been successful. These tests are not statistical in the rigorous mathematical sense, but are based on intuitive consider-ations. See Shiskin, Young, and Musgrave (1967), Lothian and Morry (1978), and Findley et al. (1990) for details.

An important characteristic of Census II is that the task of isolating randomness and seasonal factors is not done simultaneously as it is in most decomposition methods. The division of this task enlarges the computational requirements, but it also generally improves the accuracy.

It may well seem that the Census II method is very complicated because of the number of steps involved up to this point. However, the basic idea is really quite straightforward—to isolate the seasonal, trend-cycle, and irregular components one by one. The various steps and iterations are designed to refine and improve the estimate of each component.

3/5/3 Extensions to X-12-ARIMA

The X-12-ARIMA method has many additional features that are not described above. Two important additional features of X-12-ARIMA are (i) the ability to remove the effect of explanatory variables prior to decomposition and (ii) the large range of diagnostic tests available after decomposition.

Explanatory variables are particularly important since many sources of variation in the series can be removed in this manner.

Some examples are listed below.

• Trading day adjustments can be made where there is a different trading day

adjustments

3/5 Census Bureau methods 121

100200300400

data 150250

trend-cycle 8090100120

seasonal 9498102106

remainder

1950 1952 1954 1956

International airline passengers

Figure 3-11: The X-12-ARIMA multiplicative decomposition of the airline passenger data.

Seasonal component 8090100110120

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Figure 3-12: A seasonal sub-series plot for the decomposition shown in Figure 3-11. The seasonal component for June, July, and August became larger over the period of the data, with corresponding falls in February and March.

effect for each day of the week. In the airline data, trading days are not an important factor because their effects on airline schedules are largely random, owing to the fact that holidays vary from country to country.

• Outliers arise because of unusual circumstances such as major strikes. These effects can also be removed prior to decomposi-tion.

• Other changes in the level of the series such as level shifts and temporary ramp effects can also be modeled.

Some examples of how these explanatory variables can be included in the decomposition are given in Findley and Monsell (1989).

Some other additional features are:

• missing values in the series can be estimated and replaced;

missing values

• the seasonal component can be forced to be constant over time (i.e., the same seasonal component for each year);

• holiday factors (such as Easter, Labor Day, and Thanksgiving) holiday effects

can be estimated.

• automatic ARIMA model selection is available.

A more extensive discussion of X-12-ARIMA can be found in Findley et al. (1997).

In document Makridakis, Wheelwright & Hyndman - Forecasting, Methods and Applications. 3rd Ed (Page 114-122)