Explicit representation of finite predictor coefficients and its applications

(1)

EXPLICIT REPRESENTATION OF FINITE PREDICTOR COEFFICIENTS AND ITS APPLICATIONS

BY AKIHIKO INOUE AND YUKIO KASAHARA

Abstract. _{We consider the finite-past predictor coefficients of stationary time} series, and establish an explicit representation for them, in terms of the MA and AR coefficients. The proof involves the alternate iteration of projection operators associated with the infinite past and the infinite future. We provide several applications, which include rates of convergence of the finite predictor coefficients, an equality of Baxter-type for long memory processes, and a simple representation of the partial autocorrelation functionα(·). We use the last result to obtain the precise asymptotic behavior of α(·) with remainder, for the fractional ARIMA processes.

1. Introduction

Let {X_k} = {X_k : k ∈ Z} be a real, zero-mean, weakly stationary process, deﬁned on a probability space (Ω,F, P ), which we shall simply call a stationary process. We denote by H the real Hilbert space spanned by {X_k : k ∈ Z} in L2(Ω,F, P ). The norm of H is given by Y := E[Y2]1/2. For n∈ N, we denote by H_[−n,−1] and H_{(−∞,−1]} the subspaces of H spanned by {X_−n, . . . , X₋₁} and {Xk : k≤ −1}, respectively. We write P[−n,−1] and P(−∞,−1] for the orthogonal

projection operators of H onto H_[−n,−1]and H_{(−∞,−1]}, respectively. The projection P_[−n,−1]X₀ (respectively, P_{(−∞,−1]}X₀) stands for the best linear predictor of the future value X₀ based on the ﬁnite past{X_−n, . . . , X₋₁} (respectively, the inﬁnite past {X_k : k ≤ −1}), and its mean square prediction error is given by σ_n2 := X0− P[−n,−1]X02 (respectively, σ2:=X0− P(−∞,−1]X02).

For nondeterministic{X_k} (see Section 2.1), the ﬁnite predictor coeﬃcients φ_n,j are the uniquely determined ones in

P_[−n,−1]X₀= n j=1 φ_n,jX_−j. (1.1) Date: May 4, 2004.

1991 Mathematics Subject Classification. Primary 60G25; secondary 62M20, 62M10.

Key words and phrases. Predictior coefficients, AR coefficients, MA coefficients, prediction,

fractional ARIMA processes, long memory, Baxter’s inequality, partial autocorrelation functions.

(2)

As is well known, we can calculate the numerical values of φ_n,1, . . . , φ_n,nas well as the mean square prediction error σ_n2from the values γ(0), . . . , γ(n) of the autoco-variance function of{X_k}, using recursive algorithms such as the Durbin–Levinson algorithm (see Brockwell and Davis [5], Sections 3.4 and 5.2). The recursive meth-ods are of great practical importance in time series analysis. However, they are not necessarily eﬀective in problems of theoretical character, in particular, those related to the asymptotic behavior as n→ ∞.

A classical problem of this type is the rate of convergence of σ_n2− σ2 ↓ 0 as n → ∞. See for instance Ginovian [7], where references to earlier work — by Grenander and Rosenblatt, Grenander and Szeg¨o, Baxter, Ibragimov and many others — are given. The arguments in these references are closely related to the theory of orthogonal polynomials as described in [9].

A new approach to a related problem was introduced by the first author [14]. For the partial autocorrelation function α(·) of a stationary process {X_k} with short or long memory, the asymptotic behavior of|α(n)| as n → ∞ was obtained using a representation of the mean square prediction error σ_n2in terms of the MA (moving-average) coefficients c_k and the AR (autoregressive) coefficients a_k (see Section 2.2 for the definitions of c_k and a_k). More precisely, after the precise behavior of σ_n2−σ2was obtained by arguments involving c_kand a_k, that of|α(n)| was derived from the result by a Tauberian argument, and in so doing the necessary Tauberian condition was verified using the representation of σ_n2. The partial autocorrelation α(n), which is equal to φ_n,n, is the correlation between X₀ and X_n, adjusted for the intervening observations X₁, . . . , X_n−1 (see (3.22) for definition). The partial autocorrelation function (PACF) is one of the core concepts in time series analysis. By the same approach but with extra complication, similar results on |α(n)| were obtained in [15] and [17] for the fractional ARIMA (autoregressive integrated moving-average) processes. The fractional ARIMA model is an important para-metric model including a class of long memory processes. It was independently introduced by Granger and Joyeux [8] and Hosking [12] (see Example 2.4).

The advantage of such approach, i.e., that via c_kand a_k, has become more appar-ent in [16] where a represappar-entation of the partial autocorrelation function α(·) itself, in terms of c_k and a_k, was derived for the ﬁrst time. The representation enabled us to study the behavior of α(·) more directly without using Tauberian arguments.

(3)

As a result, proofs were simpliﬁed considerably, and results were improved in sev-eral ways. In particular, the asymptotic behavior of α(n) as n→ ∞, rather than that of |α(n)|, was obtained. For example, it was shown that, for the fractional ARIMA(p, d, q) processes with d∈ (−1/2, 1/2) \ {0}, α(n) ∼ d/n as n → ∞.

In this paper, our main interest is in the finite predictor coefficients φ_n,j which are among the most basic quantities in the prediction theory for {X_k}. After we establish an explicit representation of the type above for φ_n,j, i.e., that in terms of the MA coefficients c_k and the AR coefficients a_k, we give several applications of the representation.

For n ∈ N, we write H_[−n,∞) for the subspace of H spanned by {X_k : k ≥ −n}, and P[−n,∞) for the orthogonal projection operator of H onto H[−n,∞). Our method for the proof of the representation of φ_n,j involves the alternate iteration of P_{(−∞,−1]} and P_[−n,∞) as well as Theorem 3.1 in [14] which concerns the key equality H_{(−∞,−1]}∩H_[−n,∞)= H_[−n,−1]. By the method, we first establish a general theorem for the finite predictor coefficients (Theorem 2.2). The result yields the representation of φ_n,j in terms of c_k and a_k when Wiener’s prediction formula is available (Theorem 2.3). In applications, it is essential that φ_n,j is expressed in terms of absolutely convergent series made up of c_k and a_k. We derive such an expression (Theorem 2.7) under some conditions on c_k and a_k, that is, (A1) or (A2) in Section 2.3. The condition (A1) corresponds to short memory processes, while (A2) to long memory processes.

The first application of the representation of φ_n,jconcerns the rate of convergence of φ_n,j to its limit as n→ ∞. If we let n → ∞, then, under suitable conditions, φ_n,j converges to the infinite prediction coefficient φ_j in

P_{(−∞,−1]}X₀= ∞ j=1 φ_jX_−j. (1.2)

The rate at which φ_n,j converges to φ_j is an interesting question. A textbook treatment of this problem can be found in Pourahmadi [20], Section 7.6; this book, as well as [5], provides excellent background of time series analysis and prediction theory for the present paper. Using the representation of φ_n,j, we give precise results on the rate of convergence for a class of intermediate memory processes (Theorem 3.7) as well as for a class of long memory processes which includes the fractional ARIMA(p, d, q) processes with d∈ (0, 1/2) (Theorem 3.3).

(4)

The second application of the representation of φ_n,j is related to the additional error P_[−n,−1]X₀−n_j=1φ_jX_−j that arises when we use the infinite predictor coefficients φ_j instead of the finite ones φ_n,j (see Baxter [1], page 138, and Cheng and Pourahmadi [6], page 118). There exists a known inequality that deals with this problem, and is commonly referred to as Baxter’s inequality (see [1]; see also Berg [3], [6] and Section 6.6.2 in [20]). It takes the form

n j=1 |φn,j− φj| ≤ M ∞ k=n+1 |φk| (1.3)

with some positive constant M . The original inequality (1.3) of Baxter was an assertion for short memory processes, and his proof in [1] used the theory of or-thogonal polynomials. By simple arguments based on the representation of φ_n,j, we prove (1.3) not only for short memory processes but also for long memory pro-cesses including the fractional ARIMA propro-cesses (Theorems 4.3 and 4.5). In the short memory case, the proof gives an M explicitly in terms of a_k and c_k.

The third application is a representation of the PACF α(·) in terms of a_k and c_k. The proof involves the representation of φ_n,j and the equality

φ_n,j− φ_n+1,j = φ_n,n+1−jα(n + 1), (1.4)

which is a part of the Durbin-Levinson algorithm (see, e.g., [5], (5.2.4), page 171). As stated above, we already have such a representation of α(·) in [16]. The repre-sentation of the finite predictor coefficients also gives this type of result since φ_n,n is nothing but α(n). Among these results, the new representation of α(·) (Theorem 5.4) is of special interest in view of its simplicity. We illustrate the usefulness by applying the result to two classes of processes, that is, a class of short memory pro-cesses and the fractional ARIMA model. For the latter, we derive the asymptotic behavior of α(n) with remainder (Theorem 5.6), which is a refinement of the results in [15, 16].

In Section 2, we prove the representation of the ﬁnite predictor coeﬃcients φ_n,j. In Section 3, we apply it to the rate of convergence of φ_n,j. In Section 4, we prove inequalities of Baxter-type using the representation of φ_n,j. Finally in Section 5, we prove a new representation of the PACF α(·) using that of φ_n,j, and then apply it to derive the asymptotic behavior of α(·) for a class of short memory processes and the fractional ARIMA model.

(5)

2. Finite predictor coefficients

Let {X_n} = {X_n : n ∈ Z} be a stationary process; as stated in Section 1, this means that {X_n} is a real, zero-mean, weakly stationary process, deﬁned on a probability space (Ω,F, P ). The autocovariance function γ(·) of {X_n} is deﬁned by

γ(n) := E[X_nX₀], n∈ Z.

If there exists an even, nonnegative, and integrable function ∆(·) on [−π, π] such that

γ(n) = π

−π

einλ∆(λ)dλ, n∈ Z,

then ∆(·) is called a spectral density of {X_n}. As is well known, {X_n} is purely nondeterministic (PND) if and only if it has a positive spectral density such that _π

−π| log ∆(λ)|dλ < ∞ (see Section 5.7 in [5] and Chapter II in [21]). In this paper,

we say that a stationary process{X_n} is long memory (respectively, short memory) if∞_k=−∞|γ(k)| = ∞ (respectively, < ∞). See Beran [2], page 6, and [5], Section 13.2.

As we also stated in Section 1, we denote by H the closed real linear hull of {Xk : k ∈ Z} with respect to the norm Y := E[Y2]1/2. Then H is a real

Hilbert space with inner product (Y, Z) := E[Y Z]. For n, m ∈ Z with n ≤ m, we write H_(−∞,n], H_[n,∞), H_[n,m] and H_{n} for the closed subspaces of H spanned by {X_k : −∞ < k ≤ n}, {X_k : n ≤ k < ∞}, {X_k : n ≤ k ≤ m}, and X_n, respectively. Notice that H_{n} = H_[n,n]. For an interval I, we write P_I for the orthogonal projection operator of H onto H_I.

2.1. General result. Let Y ∈ H. For n ∈ N, P_[−n,−1]Y is the best linear predictor of Y based on the observations X_−n, . . . , X₋₁. If {X_k} is nondeterministic, that is, X₀ ∈ H_{(−∞,−1]}, then X_−n, . . . , X₋₁ are linearly independent, whence we can express the predictor P_[−n,−1]Y uniquely in the form

P_[−n,−1]Y = n j=1 φ_n,j(Y )X_−j. (2.1)

In this section, we are concerned with the real coeﬃcients φ_n,j(Y ). For n, k∈ N, we deﬁne the orthogonal projection operator Pk

n by P_nk:= P_{(−∞,−1]}, k = 1, 3, 5, . . . , P_[−n,∞), k = 2, 4, 6, . . . . (2.2) 5

(6)

Lemma 2.1. Assume that {X_n} is nondeterministic. Let Y be an arbitrary

ele-ment of H. Then, for n, k ∈ N, there exist unique real coeﬃcients φk_n,1(Y ), . . . , φk

n,n(Y ) as well as Znk ∈ H(−∞,−n−1] for k odd and Znk ∈ H[0,∞) for k even, such

that P_nkP_nk−1· · · P_n1Y = n j=1 φk_n,j(Y )X_−j+ Z_nk.

Proof. We assume that k is odd. From Lemma 6.1 in [20] (Regression Lemma), it follows that

H_{(−∞,−1]}= H_{(−∞,−n−1]}+ H_[−n,−1] (direct sum) (2.3)

(see the proof of Theorem 6.3 in [20]). Since X_−n, . . . , X₋₁are linearly independent and Pk

nPnk−1· · · Pn1Y ∈ H(−∞,−1], the lemma for k odd follows. The case in which

k is even is proved in a similar fashion. It is natural to ask if φk

n,j(Y ) converges to φn,j(Y ) as k→ ∞. The next theorem

answers this question in the aﬃrmative for PND {X_n} with spectral density ∆(·) such that

_π

−π

∆(λ)−1dλ <∞. (2.4)

Theorem 2.2. Let n∈ N and j = 1, . . . , n. We assume that

{Xn} is purely nondeterministic and satisﬁes (2.4).

(2.5)

Then, for every Y ∈ H, we have φ_n,j(Y ) = lim_k→∞φk n,j(Y ).

Proof. By Theorem 3.1 in [14], the assumption (2.5) implies H_{(−∞,−1]}∩ H_[−n,∞)= H_[−n,−1] (n = 1, 2, . . . ). (2.6)

This and von Neumann’s theorem (see, e.g., Theorem 9.20 in [20]) yield

s-lim m→∞P k nPnk−1· · · Pn1= P[−n,−1] (n = 1, 2, . . . ). (2.7) We put _k:= X_k− P_{(−∞,k−1]}X_k (k∈ Z). (2.8)

Then, from Lemma 2.1, we see that

(P_n2k+1P_n2k· · · P_n1Y, ₋₁) = φ2k+1_n,1 (Y )· ₋₁2.

By (2.7), the left-hand side tends to (P_[−n,−1]Y, ₋₁) as k → ∞. Thus a_n,1 := lim_k→∞φ2k+1_n,1 (Y ) exists. In the same way, letting k→ ∞ in

(7)

we ﬁnd the existence of a_n,2 := lim_k→∞φ2k+1_n,2 (Y ). Repeating this argument, we see that a_n,j := lim_k→∞φ2k+1_n,j (Y ) exists for all j = 1, . . . , n. Hence Z_n := lim_k→∞Z_n2k+1also exists in H, and we have

Z_n = P_[−n,−1]Y −n

j=1an,jX−j.

Since the right-hand side is in H_[−n,−1], so is Z_n. Moreover, Z_n ∈ H_{(−∞,−n−1]} since, for every k ≥ 1, Z_n2k+1 belongs to the closed subspace H_{(−∞,−n−1]}. Com-bining, Z_n ∈ H_[−n,−1] ∩ H_{(−∞,−n−1]}. However, by (2.3), this implies Z_n = 0. Thus P_[−n,−1]Y = n_j=1a_n,jX_−j. By uniqueness, we obtain φ_n,j(Y ) = a_n,j = lim_k→∞φ2k+1_n,j (Y ). Similarly, we have φ_n,j(Y ) = lim_k→∞φ2k_n,j(Y ). Thus the theo-rem follows.

Remark 1. A stationary process{X_n} is said to be purely minimal if X₀ does not belong to the closed linear span of{X_k : k ∈ Z, k = 0} in H (see Section 8.5 in [20]). The assumption (2.5) of Theorem 2.2 is equivalent to saying that{X_n} is purely minimal (see Makagon and Weron [19] and Salehi [23]; see also Theorem 8.10 in [20]).

Remark 2. The proofs above show that the conclusion of Theorem 2.2 holds under the following weaker assumption than (2.5):

{Xn} is nondeterministic and satisﬁes (2.6).

(2.9)

In fact, we have assumed (2.5) to ensure (2.9). The condition (2.5) holds in most interesting examples, and, unlike (2.9), we can easily check it.

Remark 3. From (2.6), we obtain

H_{(−∞,−1]}∩ H_[0,∞)⊂ H_{(−∞,−1]}∩ H_[−1,∞)= H_{−1}, H_{(−∞,−1]}∩ H_[0,∞)⊂ H_(−∞,0]∩ H_[0,∞)= H_{0},

while X₋₁and X₀are linearly independent if{X_n} is nondeterministic. Thus, (2.9) implies that H_{(−∞,−1]}∩H_[0,∞) ={0}. Combining this and the arguments in Helson and Sarason [10], page 6, we see that, under (2.9),{X_n} must be PND, whence has a spectral density ∆(·). It is an open problem to characterize the condition (2.9) in terms of ∆(·).

2.2. Representation in terms of MA and AR coeﬃcients. In this section, we assume that the stationary process{X_n} is purely nondeterministic. For n ∈ N

(8)

and m∈ N ∪ {0}, we can express the (m + 1)-step predictor P_[−n,−1]X_muniquely in the form P_[−n,−1]X_m= n j=1 φm_n,jX_−j. (2.10)

We are concerned with representation of the real coefficients φm_n,j, which we call the (m + 1)-step finite predictor coefficients. In the 1-step case m = 0, we have φ0_n,j= φ_n,j by (1.1).

We consider the following outer function:

h(z) :=√2πexp 1 4π π −π eiλ+ z eiλ− zlog ∆(λ)dλ , z∈ C, |z| < 1. The function h(z) is holomorphic and has no zeros in |z| < 1. We deﬁne the MA coeﬃcients c_n by

h(z) =∞

n=0cnz

n_, _{|z| < 1,}

and the AR coeﬃcients a_n by

−1/h(z) =∞

n=0anz

n_, _{|z| < 1}

(cf. Section 2 in [14]). Both{c_n} and {a_n} are real sequences, and we have c₀> 0 and ∞₀ (c_n)2 < ∞. The coeﬃcients c_n and a_n are actually those that appear in the following MA(∞) and AR(∞) representations, respectively, of {X_n} (under suitable condition such as (2.14) below for the latter):

X_n = n j=−∞ c_n−jξ_j, n∈ Z, (2.11) n j=−∞ a_n−jX_j+ ξ_n= 0, n∈ Z, (2.12)

where {ξ_k} is the innovation process given by ξ_k = _k/_k with _k in (2.8); see, e.g., Chapter II in [21] for (2.11) and (4.9) in [14] for (2.12). By the assumption that{X_k} is PND, {ξ_k} forms a complete orthonormal system of H such that, for every n∈ Z, the closed linear span of {ξ_k:−∞ < k ≤ n} in H is equal to H_(−∞,n]. Notice that the sums in (2.11) and (2.12) may not converge absolutely in H.

We put bm_j := m k=0 c_ka_j+m−k, m, j∈ N ∪ {0}.

(9)

In particular, b0_j = c₀a_j. For n ∈ N and m, j ∈ N ∪ {0}, we deﬁne bm k (n, j) recursively by    bm₁ (n, j) = bm_j , bm_k+1(n, j) =∞ m1=0 bm_n+1+m 1b m1 k (n, j), k = 1, 2, . . . . (2.13)

From the proof of Theorem 2.3 below, we see that, under the condition

∞

n=0

|an| < ∞

(2.14)

which ensures the absolute convergence of the sums in (2.12), the sums in (2.13) also converge absolutely. We put, for m∈ N ∪ {0}, n ∈ N, and j = 1, 2, . . . , n,

g_km(n, j) :=

bm

k(n, j), k = 1, 3, . . . ,

bm_k(n, n + 1− j), k = 2, 4, . . . .

We write ∞− for the improper sum: ∞− = lim_M→∞M. The following theorem gives an explicit representation of the (m + 1)-step finite predictor coeffi-cients φm_n,j in (2.10), in terms of the MA and AR coefficients, under the absolute convergence of the sums in (2.12).

Theorem 2.3. We assume that the AR coeﬃcients a_nof a purely nondeterministic stationary process {X_n} satisfy (2.14). Then we have φm_n,j = ∞−_k=1gm_k (n, j) for n∈ N, m ∈ N ∪ {0} and j = 1, . . . , n, that is,

P_[−n,−1]X_m= n j=1 ∞− k=1g m k (n, j) X_−j.

Proof. For m ∈ N ∪ {0} and n ∈ N, we have the following Wiener prediction formulas (cf. [14, Theorem 4.4]): P_{(−∞,−1]}X_m=∞ j=1b m j X−j, (2.15) P_[−n,∞)X_−n−1−m=∞ j=1b m j X−n−1+j, (2.16)

the sums converging absolutely in H. Recall P_nk from (2.2). From (2.15), we have

P_n1X_m=n j=1g m 1 (n, j)X−j+∞ m1=0b m n+1+m1X−n−1−m1.

From this and (2.16), it follows that

P_n2P_n1X_m= n j=1 gm₁(n, j)X_−j+ ∞ m1=0 bm_n+1+m₁ ∞ j=1 bm1 j X−n−1+j = n j=1 {gm 1 (n, j) + gm2(n, j)}X−j+ ∞ m1=0 bm_n+1+m 1 ∞ m2=0 bm1 n+1+m2Xm2. 9

(10)

Similarly, P_n3P_n2P_n1X_m= n j=1 {gm 1(n, j) + g2m(n, j)}X−j + ∞ m1=0 bm_n+1+m 1 ∞ m2=0 bm1 n+1+m2 ∞ j=1 bm2 j X−j = n j=1 {gm 1(n, j) + g2m(n, j) + gm3 (n, j)}X−j + ∞ m1=0 bm_n+1+m₁ ∞ m2=0 bm1 n+1+m2 ∞ m3=0 bm2 n+1+m3X−n−1−m3.

Repeating this argument, we see that φk

n,j(Xm) in Lemma 2.1 with Y = Xm are

given by φk_n,j(X_m) =k_l=1g_lm(n, j). The condition (2.14) implies ∞₀ (a_n)2<∞, whence (2.4) (cf. [14], Proposition 4.2). Thus the theorem follows from Theorem 2.2.

2.3. Representation by absolutely convergent series. In later applications, it is essential to express the finite predictor coefficients φ_n,j in (1.1) by an absolutely convergent series made up of a_k and c_k. In this section, we first give such an expression for bm

k (n, j). In the 1-step case m = 0, the result gives the desired

representation for φ_n,j.

We write R₀ for the class of slowly varying functions at infinity: the class of positive, measurable (·), defined on some neighborhood [A, ∞) of infinity, such that lim_x→∞ (λx)/ (x) = 1 for all λ > 0 (see Chapter 1 in [4] for background).

Throughout this section, we assume that the stationary process {X_n} satisﬁes one of the following conditions (A1) and (A2):

(A1) {X_n} is purely nondeterministic, and {a_n} and {c_n} satisfy, respectively, (2.14) and ∞ n=0 |cn| < ∞. (2.17)

(A2) {X_n} is purely nondeterministic, and, for d ∈ (0, 1/2) and (·) ∈ R₀,{c_n} and{a_n} satisfy, respectively,

c_n∼ n−(1−d) (n), n→ ∞, (2.18) a_n ∼ n−(1+d) 1 (n)· d sin(πd) π , n→ ∞. (2.19)

(11)

It should be noticed that (2.19) implies (2.14). By (2.11), the autocovariance function γ(·) has the expression

γ(n) = ∞ k=0 c_|n|+kc_k, n∈ Z. (2.20)

Hence, (2.17) implies that ∞ n=0|γ(n)| ≤ ∞ k=0|ck| 2 <∞.

Thus{X_n} is short memory under (A1). On the other hand, by (2.20) and Propo-sition 4.3 in [13], (2.18) implies that

γ(n)∼ n−(1−2d) (n)2B(d, 1− 2d), n→ ∞. (2.21)

Since 0 < 1− 2d < 1, we see that {X_n} is long memory under (A2). We remark that, under suitable conditions, (2.18), (2.19) and (2.21) are equivalent (cf. [14], Theorem 5.1).

Example 2.4. For d∈ (−1/2, 1/2) and p, q ∈ N ∪ {0}, a stationary process {X_n}

is said to be a fractional ARIMA(p, d, q) process if it has a spectral density ∆(·) of the form ∆(λ) = 1 2π |θ(eiλ₎_|2 |φ(eiλ₎|2|1 − e iλ_|−2d_, _{−π ≤ λ ≤ π,}

where φ(z) and θ(z) are polynomials with real coeﬃcients of degrees p, q, respec-tively. We assume that φ(z) and θ(z) have no common zeros, and that neither φ(z) nor θ(z) has zeros in the closed unit disk {z ∈ C : |z| ≤ 1}. We also assume without loss of generality that θ(0)/φ(0) > 0. Then the outer function h(·) is given by h(z) = (1− z)−dθ(z)/φ(z) (see, e.g., Section 2 in [15]). If 0 < d < 1/2, then {Xn} satisﬁes (A2) for some constant function (·) (see Corollary 3.1 in [18]). If

d = 0, then {X_n} is also called an ARMA(p, q) process (see [5], Chapter 3), and both{c_n} and {a_n} decay exponentially, whence (A1) is satisﬁed.

We put B_n:= ∞ v=0 |cvan+v|, n∈ N ∪ {0}.

For n, k, u, v∈ N ∪ {0}, we deﬁne D_k(n, u, v) recursively by    D₀(n, u, v) := δ_uv, D_k+1(n, u, v) :=∞ w=0Bn+v+wDk(n, u, w). 11

(12)

We have, for example, D₃(n, u, v) = ∞ v1=0 ∞ v2=0 B_n+v+v₁B_n+v₁_+v₂B_n+v₂_+u.

By the Fubini–Tonelli theorem, we have D_k(n, u, v) = D_k(n, v, u).

Lemma 2.5. The conditions (A1) and (A2) imply, for k, n, v∈ N ∪ {0}, ∞ u=0 D_k(n, u, v) <∞ and ∞ u=0 D_k(n, u, v)2<∞,

respectively. In particular, we have D_k(n, u, v) <∞ for k, n, u, v ∈ N ∪ {0}. Proof. First we assume (A1). Then

∞ m=0Bm≤ ∞ u=0|cu| ∞ u=0|au| <∞. This and the nonnegativity of B_mimply, for example,

∞ u=0 D₃(n, u, v) = ∞ u=0 ∞ v1=0 ∞ v2=0 B_n+v+v₁B_n+v₁_+v₂B_n+v₂_+u ≤∞ m=0Bm ₃ <∞. The general case can be proved in the same way.

Next we assume (A2). The proof in this case is the same as that of Lemma 2.1 in [16]. By (A2) and Proposition 4.3 in [13], we have B_n = O(n−1) as n → ∞. Therefore, for n ∈ N, f_u → ∞_v=0B_n+u+vf_v deﬁnes a bounded linear operator on l2 (see Chapter IX in [11]). Since D_k+1(n, u, v) =_wB_n+u+wD_k(n, w, v), we obtain the desired result by induction on k.

We put β_n:= ∞ v=0 c_va_v+n, n = 0, 1, . . . . (2.22)

In view of Lemma 2.5, we may deﬁne δ_k(n, u, v) recursively by, for k, n, u, v ∈

N∪ {0},    δ₀(n, u, v) = δ_uv, δ_k+1(n, u, v) =∞ w=0βn+v+wδk(n, u, w). (2.23)

By Lemma 2.5 and the Fubini theorem, we have δ_k(n, u, v) = δ_k(n, v, u). The following theorem expresses bm_k(n, j) by an absolutely convergent series.

(13)

Theorem 2.6. We assume either (A1) or (A2). Then, for n, k ∈ N and m, j ∈ N∪ {0}, bm_k(n, j) = m v=0 c_m−v ∞ u=0 a_j+uδ_k−1(n + 1, u, v), (2.24)

the sum converging absolutely.

Proof. By Lemma 2.5 and (3.4), we have ∞

u=0|aj+u|Dk−1(n + 1, u, v)

≤ {sup_uD_k−1(n + 1, u, v)}∞

u=0|aj+u| < ∞.

(2.25)

Thus, the right-hand side of (2.24), which we denote by Bm

k (n, j), converges

abso-lutely. To prove the proposition, it is enough show that Bm

k (n, j) satisﬁes the same

recursion as (2.13). First we have B₁m(n, j) =m v=0cm−v ∞ u=0aj+uδuv = m v=0cm−vaj+v= b m j ,

as desired. Next, the Fubini–Tonelli theorem and (2.25) yield, for k≥ 1,

∞ u=0 a_j+uδ_k(n + 1, u, v) = ∞ u=0 a_j+u ∞ w=0 ∞ m1=w c_m₁_−wa_n+1+v+m₁ δ_k−1(n + 1, u, w) = ∞ m1=0 a_n+1+v+m₁ m1 w=0 c_m₁_−w ∞ u=0 a_j+uδ_k−1(n + 1, u, w) = ∞ m1=0 a_n+1+v+m₁Bm1 k (n, j), so that B_k+1m (n, j) = m v=0 c_m−v ∞ m1=0 a_n+1+v+m₁Bm1 k (n, j) = ∞ m1=0 m v=0cm−van+1+m1+v Bm1 k (n, j) = ∞ m1=0 bm_n+1+m₁Bm1 k (n, j). Thus Bm k (n, j) satisﬁes (2.13).

For later applications, we consider the case m = 0 separately. We put

d_k(n, j) := δ_k(n, 0, j), n, k, j∈ N ∪ {0}. 13

(14)

Then, by (2.23), d_k(n, j) satisﬁes the following recursion: for k, n, j∈ N ∪ {0},    d₀(n, j) = δ_j0, d_k+1(n, j) =∞ v=0βn+j+vdk(n, v). (2.26)

More explicitly, d_k(n, j) are given by, for n, j∈ N ∪ {0}, d₁(n, j) = β_n+j, d₂(n, j) = ∞ v1=0 β_n+j+v₁β_n+v₁, and, for k = 3, 4, . . . , d_k(n, j) = ∞ v1=0 · · · ∞ vk−1=0 β_n+j+v_k−1β_n+v_k−1+vk−2· · · β_n+v₂_+v₁β_n+v₁,

the sums converging absolutely. We put

b_k(n, j) := b0_k(n, j), g_k(n, j) := g0_k(n, j)

for (k, n, j) for which the right-hand sides are deﬁned. Then, for n ∈ N and j = 1, 2, . . . , n, we have g_k(n, j) = b_k(n, j), k = 1, 3, . . . , b_k(n, n + 1− j), k = 2, 4, . . . . (2.27)

By Theorem 2.3 and Proposition 2.6, we immediately obtain the following final form of the representation of the 1-step finite predictor coefficients φ_n,j.

Theorem 2.7. We assume either (A1) or (A2). Then, for n∈ N and j = 1, . . . , n,

we have φ_n,j =∞−_k=1g_k(n, j) with (2.27) and      b₁(n, v) = c₀a_v, v≥ 0, b_k(n, v) = c₀ ∞ u=0 a_v+ud_k−1(n + 1, u), k≥ 2, v ≥ 0, the sum on the right-hand side converging absolutely.

Remark 4. Under either (A1) of (A2), the sum∞_k=1g_k(n, j) converges absolutely for n large enough and j = 1, . . . , n. See Propositions 3.4 and 4.4 below.

3. Rates of convergence of finite predictor coefficients

If the stationary process {X_n} is PND and satisﬁes (2.14), then we have the Wiener prediction formula (2.15) with m = 0 or (1.2) with

φ_j= c₀a_j, j∈ N. (3.1)

(15)

We call φ_j the inﬁnite predictor coeﬃcients. It holds that

lim

n→∞φn,j = φj, j ∈ N

(cf. Theorem 7.14 in [20]). In this section, we investigate the rate at which φ_n,j converges to φ_j as n→ ∞. Notice that, by (2.13), (2.27) and (3.1), we have

φ_j= b₁(n, j) = g₁(n, j), n∈ N, j = 1, . . . , n. (3.2)

Thus φ_jis the ﬁrst term of the series∞−_k=1g_k(n, j) in Theorem 2.7 expressing φ_n,j. This suggests the usefulness of the expression for our purpose.

3.1. Long memory processes. In this section, we assume that the stationary process{X_n} satisﬁes (A2) in Section 2.3. Thus {X_n} is long memory.

For u≥ 0, we put f₁(u) := 1 π(1 + u), f2(u) := 1 π2 _∞ 0 ds₁ (s₁+ 1)(s₁+ 1 + u), and, for k = 3, 4, . . . , f_k(u) := 1 πk _∞ 0 ds_k−1· · · _∞ 0 ds₁ 1 (s_k−1+ 1)× k−2 m=1 1 (s_m+1+ s_m+ 1) ×_(s 1 1+ 1 + u) (see [15], Section 3; see also [14], Section 6).

Lemma 3.1. (i) ∞_k=1f_2k(0)x2k = (π−1arcsin x)2 for|x| < 1; (ii) ∞_k=1f_2k−1(0)x2k−1= π−1arcsin x for|x| < 1.

Proof. Let j ≥ 1. We easily see that f_1+j(u) = ₀∞f₁(s + u)f_j(s)ds for u ≥ 0. Hence, we have, for u≥ 0,

f_2+j(0) = _∞ 0 f₁(s)f_j+1(s)ds = _∞ 0 dsf₁(s) _∞ 0 f₁(u + s)f_j(u)du = _∞ 0 _∞ 0 f₁(s + u)f₁(s)ds f_j(u)du = _∞ 0 f₂(u)f_j(u)du. Repeating this argument, we obtain

_∞ 0

f_i(u)f_j(u)du = f_i+j(0), i, j∈ N. (3.3)

Thus, the assertion (i) follows from Lemma 6.5 in [14], while (ii) from Lemma 3.4 in [16].

Recall d_k(n, u) from Section 2.3.

Proposition 3.2. (i) For r∈ (1, ∞), there exists N ∈ N such that 0 < d_k(n, u)≤ fk(0){r sin(πd)}

k

n , u∈ N ∪ {0}, k ∈ N, n ≥ N. (3.4)

(16)

(ii) For k∈ N and u ∈ N ∪ {0}, d_k(n, u)∼ n−1f_k(0) sink(πd) as n→ ∞. Proof. Let r > 1. Recall β_n from (2.22). The condition (A2) implies

β_n∼sin(πd)

π n

−1_, _n_{→ ∞}

(3.5)

(see Proposition 4.3 in [13]). Thus, for n large enough,

0 < β_[ns]+n+u≤ r

1/2_sin(πd)

π([ns] + n + u), s≥ 0, u ∈ N ∪ {0}. Since we have, for n large enough,

1

[ns] + n + u ≤ r1/2

n(s + 1), s≥ 0, u ∈ N ∪ {0}, there exists N₁∈ N such that

0 < β_[ns]+n+u≤ r sin(πd) π(s + 1)n

−1_, _s_{≥ 0, u ∈ N ∪ {0}, n ≥ N}

1. (3.6)

In the same way, we can choose N₂ so that 0 < β_[ns₂_]+[ns₁_]+n≤ r sin(πd)

π(s₂+ s₁+ 1)n

−1_, _s

1, s2≥ 0, n ≥ N2. (3.7)

Therefore, we have, for n≥ N := max(N₁, N₂), 0 < d₃(n, u) = _∞ 0 ds₂ _∞ 0 ds₁· β_[s₂_]+n· β_[s₂_]+[s₁_]+n· β_[s₁_]+n+u = n2 ∞ 0 ds₂ ∞ 0 ds₁· β_[ns₂_]+n· β_[ns₂_]+[ns₁_]+n· β_[ns₁_]+n+u ≤ {r sin(πd)}3 n 1 π3 _∞ 0 ds₂ _∞ 0 ds₁ 1 (s₂+ 1)(s₂+ s₁+ 1)(s₁+ 1) = {r sin(πd)} 3 n f3(0),

which implies (3.4) with k = 3. Notice that N is independent of the choice k = 3. We can prove (3.4) for general k and the same N in a similar fashion.

We also prove (ii) only for k = 3; the general case can be treated in the same way. By (3.5), we have lim n→∞nβ[ns]+n+u= sin(πd) π(s + 1), s≥ 0, u ∈ N ∪ {0}, (3.8) lim n→∞nβ[ns2]+[ns1]+n= sin(πd) π(s₂+ s₁+ 1), s1, s2≥ 0. (3.9)

By (3.6)–(3.9) and the dominated convergence theorem, we obtain

lim n→∞ _∞ 0 ds₂ _∞ 0 ds₁· nβ_[ns₂_]+n· nβ_[ns₂_]+[ns₁_]+n· nβ_[ns₁_]+n+u = sin 3_(πd) π3 _∞ 0 ds₂ _∞ 0 ds₁ 1 (s₂+ 1)(s₂+ s₁+ 1)(s₁+ 1). This implies lim_nnd₃(n, u) = sin3(πd)f₃(0) or (ii) with k = 3, as desired.

(17)

The following theorem gives the rate for long memory processes, at which φ_n,j converges to φ_j. It applies, in particular, to the fractional ARIMA(p, d, q) processes with 0 < d < 1/2.

Theorem 3.3. We assume (A2). Then we have, for j ∈ N,

lim

n→∞n{φn,j− φj} = d

2∞

u=j

φ_u.

Proof. Let r > 1 be chosen so that 0 < r sin(πd) < 1. By Lemma 3.1, ∞

k=1fk(0){r sin(πd)} k_<_∞.

(3.10)

Let N be as in Proposition 3.2 (i). Then, for n≥ N and j = 1, . . . , n, n∞ u=0an−j+u ∞ k=1d2k−1(n, u)  ≤ ∞ u=0 |an−j+u| ∞ k=1 nd_2k−1(n, u) ≤∞ k=1f2k−1(0){r sin(πd)} 2k−1 ∞ u=n−j|au|, so that lim n→∞n ∞ u=0an−j+u ∞ k=1d2k−1(n, u) = 0.

Proposition 3.2, Lemma 3.1, and the dominated convergence theorem yield

lim n→∞n ∞ u=0aj+u ∞ k=1d2k(n, u) =∞ k=1f2k(0) sin 2k_(πd) ∞ u=0aj+u= d 2∞ u=jau.

Therefore, by Theorem 2.7 and (3.2), we have

lim n→∞n{φn−1,j− φj} = lim n→∞ n∞ k=1b2k+1(n− 1, j) + n ∞ k=1b2k(n− 1, n − j) = lim n→∞c0n ∞ u=0 a_j+u ∞ k=1 d_2k(n, u) + lim n→∞c0n ∞ u=0 a_n−j+u ∞ k=1 d_2k−1(n, u) = c₀d2∞ u=jau= d 2∞ u=jφu, as desired.

From the proof of Theorem 3.3, we also obtain the following proposition.

Proposition 3.4. We assume (A2). Let N be as in the proof of Theorem 3.3.

Then we have ∞_k=1|g_k(n, j)| < ∞ for n ≥ N and j = 1, . . . , n. 17

(18)

3.2. Intermediate memory processes. There are several possible setups for short-memory processes, to which we can apply Theorem 2.7 to obtain the rate of convergence of φ_n,j. Among them, we consider that of so called “intermediate memory” processes (see [5], page 520). Our class is deﬁned in the following way:

(A3) the stationary process{X_n} is purely nondeterministic, and {a_n}, {c_n} and the autocovariance function γ(·) satisfy the following conditions:

(i) c_n≥ 0 for all n ≥ 0;

(ii) {c_n} is eventually decreasing to zero; (iii) {a_n} is eventually decreasing to zero;

(iv) for p∈ (1, ∞) and ∈ R₀, γ(n)∼ n−p (n) as n→ ∞.

This class of stationary processes was considered in [16]. It is suited for asymp-totic analysis. The condition (iv) implies that the stationary processes satisfying (A3) are short memory.

Example 3.5. Let p > 1. If the autocovariance function γ(·) of a stationary process{X_n} is of the form γ(n) = 1/(1 +|n|)p_{, then it satisﬁes (A3). See Example}

in Section 7 of [14].

Throughout this section, we assume that the stationary process {X_n} satisﬁes (A3). We deﬁne a constant D by

D :=∞

k=−∞γ(k),

which is positive by (A3)(i) and (2.20). By Theorem 5.3 in [14] (see also the proof of Theorem 6.7 in [14]), (A3) implies the following asymptotics:

c_n∼ n−p (n)D−1/2, n→ ∞, (3.11) a_n ∼ n−p (n)D−3/2, n→ ∞, (3.12) β_n∼ n−p (n)D−1, n→ ∞. (3.13)

Thus, in particular,{X_n} satisﬁes (A1).

We need the following estimates to obtain the rate of convergence for φ_n,j.

Proposition 3.6. (i) For r∈ (1, ∞), there exists N ∈ N such that 0 < ndk(n, u)

(nβ_n)k ≤ (rπ) k_f

k(0), u∈ N ∪ {0}, k ∈ N, n ≥ N.

(3.14)

(19)

Proof. Let r > 1. By (3.13) and the Uniform Convergence Theorem ([4], Theorem 1.5.2), we see that, for n large enough,

0 < β[ns]+n+u β_n ≤ r 1/2 n [ns] + n + u p , s≥ 0, u ∈ N ∪ {0}. Since we have, for n large enough,

n [ns] + n + u ≤ r1/2 (s + 1)p ≤ r1/2 s + 1, s≥ 0, u ∈ N ∪ {0}, there exists N₁∈ N such that

0 < β[ns]+n+u

β_n ≤

r

1 + s, s≥ 0, u ∈ N ∪ {0}, n ≥ N1. (3.15)

In the same way, we may take N₂∈ N so large that 0 < β[ns2]+[ns1]+n

β_n ≤

r

1 + s₂+ s₁, s1, s2≥ 0, n ≥ N2. (3.16)

Therefore, we have, for n≥ N := max(N₁, N₂), 0 < d3(n, u) n2β_n3 = 1 n2β3_n _∞ 0 ds₂ _∞ 0 ds₁· β_[s₂_]+n· β_[s₂_]+[s₁_]+n· β_[s₁_]+n+u = _∞ 0 ds₂ _∞ 0 ds₁·β[ns2]+n β_n · β_[ns₂_]+[ns₁_]+n β_n · β_[ns₁_]+n+u β_n ≤ (rπ)3_f 3(0),

which implies (3.14) with k = 3. Notice that N is independent of the choice k = 3. In the same way, we can prove (3.14) for general k and the same N .

By (3.13), lim n→∞ β_[ns]+n+u β_n = 1 (1 + s)p, s≥ 0, u ∈ N ∪ {0}. (3.17)

By (3.15), (3.17) and the dominated convergence theorem, we have d₂(n, u) nβ2_n = _∞ 0 ds·β[ns]+n β_n · β_[ns]+n+u β_n → _∞ 0 1 (1 + s)2pds = 1 2p− 1 as n→ ∞. Thus (ii) follows.

The following theorem gives the rate for the present class of{X_n}, at which φ_n,j converges to φ_j.

Theorem 3.7. We assume (A3). Then we have, for j ∈ N,

φ_n,j− φ_j ∼ n−(2p−1)_(n)2 1 2p− 1 D−5/2c₀+ D−2∞ u=jφu , n→ ∞.

Proof. Let r > 1 and N be as in Proposition 3.6 (i). Let t∈ (0, 1). Since (3.13) implies lim_n→∞nβ_n = 0, there exists an integer N such that

0 < nβ_nrπ < t, n≥ N. 19

(20)

Then we have, for l≥ 3, n ≥ max(N, N) and u≥ 0, 0 < dl(n, u) nβ_n2 < (nβn) l−2_(rπ)l_f l(0) < (nβn)(rπ/t)3tlfl(0), so that, for j = 1, . . . , n, 1 nβ_n2 ∞ u=0an−j+u ∞ k=2d2k−1(n, u) ≤∞ u=0|an−j+u| ∞ k=2 d_2k−1(n, u) nβ_n2 ≤ (nβn)× (rπ/t)3 ∞ k=2f2k−1(0)t 2k−1 ∞ u=0|au|.

Since (3.13) implies nβ_n2 ∼ n−(2p−1) (n)2D−2, we obtain lim n→∞ 1 n−(2p−1) (n)2 ∞ u=0 a_n−j+u ∞ k=2 d_2k−1(n, u) = 0 (j∈ N). (3.18)

In the same way, we have, for n large enough,

1 nβ_n2 ∞ u=0aj+u ∞ k=2d2k(n, u)  ≤∞ u=0|aj+u| ∞ k=2 d_2k(n, u) nβ2_n ≤ (nβn)× (rπ/t)3 ∞ k=2f2k(0)t 2k ∞ u=j|au|, so that lim n→∞ 1 n−(2p−1) (n)2 ∞ u=0 a_j+u ∞ k=2 d_2k(n, u) = 0 (j∈ N). (3.19)

Proposition 3.6 and the dominated convergence theorem yield

lim n→∞ 1 nβ_n2 ∞ u=0 a_j+ud₂(n, u) = lim n→∞ 1 n−(2p−1) (n)2 ∞ u=0 a_j+ud₂(n, u) = D −2 2p− 1 ∞ u=j a_u. (3.20)

As in the proof of Proposition 3.6, we have

1 na_nβ_n ∞ u=0 a_n−j+ud₁(n, u) = 1 na_nβ_n _∞ 0 a_n−j+[s]· β_n+[s]ds = _∞ 0 a_n−j+[ns] a_n · β_[ns]+n β_n ds→ _∞ 0 1 (1 + s)2pds, n→ ∞ or, by (3.12) and (3.13), lim n→∞ 1 n−(2p−1) (n)2 ∞ u=0 a_n−j+ud₁(n, u) = D −5/2 2p− 1. (3.21)

(21)

Combining (3.18)–(3.21) and Theorem 2.7, we obtain 1 n−(2p−1) (n)2{φn−1,j− φj} = 1 n−(2p−1) (n)2 ∞ k=1b2k+1(n− 1, j) + ∞ k=1b2k(n− 1, n − j) = c0 n−(2p−1) (n)2 _∞ u=0 a_j+u ∞ k=1 d_2k(n, u) + ∞ u=0 a_n−j+u ∞ k=1 d_2k−1(n, u) → c0 2p− 1 D−5/2+ D−2∞ u=jau = 1 2p− 1 D−5/2c₀+ D−2∞ u=jφu , as required.

For nondeterministic stationary process{X_n}, the partial autocorrelation func-tion (PACF) α(·) is deﬁned by α(1) := γ(1)/γ(0) and

α(n) := (Xn− P[1,n−1]Xn, X0− P[1,n−1]X0) Xn− P[1,n−1]Xn · X0− P[1,n−1]X0

, n = 2, 3, . . . (3.22)

(see Sections 3.4 and 5.2 in [5] for background). By Theorem 1.8 in [16], (A3) implies α(n)∼ n−p (n)D−1 as n→ ∞. Since p > 1, this yields

∞

n=1

|α(n)| < ∞. (3.23)

The implication (A3) ⇒ (3.23) is also veriﬁed by Theorem 5.5 below since (3.12) implies a_n = O(n−(p+1)/2) as n → ∞. In view of (3.23) and Theorem 3.7, we have provided a counterexample to Theorem 7.23 in [20] which seems to mistakenly assert that, for each j, φ_n,jconverges to φ_j exponentially if and only if (3.23) holds. In fact, by Theorem 2.3 in [1] with m = 0 (see also Proposition 4.1 below), (3.23) holds under (A1) which is weaker than (A3).

4. Baxter’s inequality

We are concerned with inequalities of Baxter-type which are related to the norm convergence of φ_n,j to φ_j.

4.1. Baxter’s inequality for short memory processes. In this section, we give a short and easy proof to the original inequality (1.3) of Baxter, i.e., the case λ = 0 of Theorem 2.2 in [1]. The proof is based on the representation of φ_n,j (Theorem 2.7), which is available since the assumption in [1] is the same as (A1) in Section 2.3 by the following proposition.

Proposition 4.1. For a stationary process {X_n}, the following conditions are

equivalent:

(22)

(i) {X_n} satisﬁes (A1);

(ii) {X_n} has a positive continuous spectral density and satisﬁes (2.17); (iii) {X_n} has a positive continuous spectral density and satisﬁes (2.14);

(iv) {X_n} is short memory, i.e., ∞_k=−∞|γ(k)| < ∞, and has a positive continu-ous spectral density.

Proof. Notice that if {X_n} has a positive continuous spectral density ∆(·) on [−π, π], then it is PND since_−ππ | log ∆(λ)|dλ < ∞ holds.

Suppose (i). We write h(eiλ) for the nontangential limit of h(z), i.e.,

h(eiλ) = lim

r→1−0h(re

iλ_{) =}∞ n=0cne

inλ_, _{−π ≤ λ ≤ π.}

Since (2.17) implies the continuity of h(eiλ), the spectral density ∆(λ) is also con-tinuous by the equality ∆(λ) = 2π|h(eiλ₎_|2_{. Letting r}_{→ 1 − 0 in}

∞ n=0cnr n_einλ ∞ n=0anr n_einλ₌_−1, _{−π ≤ λ ≤ π,} we obtain ∞ n=0cne inλ ∞ n=0ane inλ₌_−1, _{−π ≤ λ ≤ π.}

This implies that h(eiλ_{), whence ∆(λ), has no zeros on [}_{−π, π]. Thus ∆(·) is}

positive, whence (ii) and (iii) follow.

Suppose (iii). In the same way as above, we have 1 ∆(λ) = 1 2π ∞ n=0ane inλ2_, _{−π ≤ λ ≤ π.}

This implies ∞_n=0a_neinλ _{= 0 for every λ ∈ [−π, π]. By Wiener’s theorem for}

absolutely convergent Fourier series (cf. Lemma 11.6 in [22]), we obtain (2.17) (cf. [3], page 493). Thus (i) follows. The proof of the implication (ii) ⇒ (i) is similar.

By (2.20), (ii) implies (iv). Conversely, we assume (iv). Then we have (2.14), whence (iii), by (3.1) and the arguments in [1], pp. 139–140, which involve the Wiener–L´evy theorem. This completes the proof.

We deﬁne A(j) :=∞ u=j|au|, F (j) := ∞ v=0|cv| A(j), j = 0, 1, . . . . (4.1)

Then F (j) decreases to zero as j→ ∞ under (A1). Recall d_k(n, j) from (2.26).

(23)

Proof. Let n∈ N. We use induction on k. Since d₁(n, u) = β_n+u, we have ∞ u=0|d1(n, u)| = ∞ u=0|βn+u| ≤ ∞ v=0|cv| ∞ u=0|an+u+v| ≤ F (n).

We assume∞_u=0|d_k(n, u)| ≤ F (n)k for k∈ N. Then

∞ u=0 |dk+1(n, u)| ≤ ∞ v=0 |dk(n, v)| ∞ u=0 |βn+v+u| ≤ F (n) ∞ v=0 |dk(n, v)| ≤ F (n)k+1_.

Thus the inequality also holds for k + 1.

Here is Baxter’s inequality with M given explicitly.

Theorem 4.3. We assume one, whence all, of (i)–(iv) in Proposition 4.1. Then,

for every L∈ (1, ∞), there exists N ∈ N such that (1.3) holds for n ≥ N with M =∞ v=0|cv| ∞ u=1|au| L.

Proof. By Theorem 2.7, Lemma 4.2 and (3.2), we have, for n = 1, 2, . . . ,

n j=1 |φn,j− φj| ≤ c0 n j=1 ∞ u=0 |au+j| ∞ k=1 |d2k(n + 1, u)| + c₀ n j=1 ∞ u=0 |au+n+1−j| ∞ k=1 |d2k−1(n + 1, u)| = c₀∞ k=1 ∞ u=0|dk(n + 1, u)| n j=1|au+j| ≤ c0A(1)∞ k=1 ∞ u=0|dk(n + 1, u)| ≤ c0A(1) ∞ k=1F (n + 1) k_.

Choose N ∈ N so that 0 < 1/{1 − F (N + 1)} < L. If n ≥ N, then F (n + 1) ≤ F (N + 1) < 1. Therefore, ∞ k=1 F (n + 1)k= F (n + 1) 1− F (n + 1) ≤ F (n + 1) 1− F (N + 1) ≤ LF (n + 1). However, by (3.1) and (4.1), we have

c₀F (n + 1) =∞

v=0|cv|

∞

j=n+1|φj|.

Thus the theorem follows.

From the proof of Theorem 4.3, we also obtain the next proposition.

Proposition 4.4. We assume (A1). Choose N ∈ N so that F (N + 1) < 1. Then

we have ∞_k=1|g_k(n, j)| < ∞ for n ≥ N and j = 1, . . . , n. 23

(24)

4.2. Baxter’s inequality for long memory processes. In this section, we prove Baxter’s inequality for long memory stationary processes. The fractional ARIMA(p, d, q) processes with 0 < d < 1/2 are among them.

Theorem 4.5. We assume (A2). Then there exists a positive constant M such

that (1.3) holds for all n∈ N.

Proof. Let r > 1 be chosen so that 0 < r sin(πd) < 1. Then we have (3.10). By Proposition 3.2 and (2.19), we may take a positive integer N such that both (3.4) and a_n > 0 hold for n≥ N. Pick δ ∈ (0, d). By (2.19) and Theorem 1.5.6 (iii) in [4] (Potter-type bounds), we may assume that

a_m/a_n ≤ 2 max{(n/m)1+d−δ, (m/n)1+d+δ}, m, n≥ N. (4.2)

As in the proof of Theorem 4.3, we have, for n≥ N + 3,

n−1 j=1 |φn−1,j− φj| ≤ c0 ∞ k=1 ∞ u=0 d_k(n, u) n−1 j=1 |au+j| = c0{G1(n) + G2(n)}, where G₁(n) := ∞ k=1 ∞ u=0 d_k(n, u) N +1 j=1 |au+j|, G₂(n) := ∞ k=1 ∞ u=0 d_k(n, u) n−1 j=N +2 a_u+j. For n≥ N + 3, we have G₁(n)≤ n−1∞ k=1fk(0){r sin(πd)} k N +1 j=1 ∞ u=0 |au+j|, and G₂(n)≤ ∞ k=1 _∞ 0 du· d_k(n, [u]) _n N a_[u]+[s]+2ds = na_n ∞ k=1 _∞ 0 du· nd_k(n, [nu]) ₁ N/n a_[nu]+[ns]+2 a_n ds. By (4.2), we have, for u > 0, n≥ N + 3, and N/n ≤ s ≤ 1,

a_[nu]+[ns]+2 a_n ≤ 2 max n [nu] + [ns] + 2 1+δ+d , n [nu] + [ns] + 2 1−δ+d ≤ 2 max{(u + s)−(1+d+δ)_{, (u + s)}−(1+d−δ)_}.

(25)

Hence, by (3.4), G₂(n)≤ 2 ₁ 0 ds _∞ 0 (u + s)−(1+d+δ)+ (u + s)−(1+d−δ) du × nan ∞ k=1fk(0){r sin(πd)} k ≤ 2 1 (δ + d)(1− d − δ)+ 1 (d− δ)(1 − d + δ) × nan ∞ k=1fk(0){r sin(πd)} k_.

Combining these estimates, we obtain

lim sup n→∞ nd (n)n−1 j=1|φn−1,j− φj| <∞.

Since ∞_k=nφ_k = c₀_k=n∞ a_k ∼ c₀sin(πd)/{πnd_(n)_{} as n → ∞, the theorem}

follows.

5. Partial autocorrelation functions

5.1. Representation in terms of MA and AR coefficients. In this section, we assume that the stationary process{X_n} satisfies either (A1) or (A2) in Section 2.3. The aim of this section is to prove an explicit representation of the PACF α(·) in terms of the MA and AR coefficients. The formula is much simpler than the earlier one in [16]. The proof is based on Theorem 2.7.

Recall β_n from (2.22). For n∈ N ∪ {0}, we deﬁne α₁(n) := β_n and, for k = 3, 5, . . . , α_k(n) := ∞ v1=0 · · · ∞ vk−1=0 β_n+v₁β_n+1+v₁_+v₂· · · β_n+1+v_k−2+vk−1β_n+1+v_k−1. As in the case of d_k(n, j) in Section 2.3, the sums converge absolutely. We have

α_2k+1(n) = ∞ v=0 β_n+vd_2k(n + 1, v), n, k∈ N ∪ {0}. (5.1)

The following proposition is the key to the proof of the main result in this section (i.e., Theorem 5.4 below).

Proposition 5.1. For n, j∈ N ∪ {0} and k ∈ N, we have

d_2k(n, j)− d_2k(n + 1, j) = k l=1 α_2k−2l+1(n)d_2l−1(n, j). (5.2)

Proof. Let n, j∈ N ∪ {0}. We use mathematical induction on k. Since α₁(n) = β_n and d₁(n, j) = β_n+j, we have

d₂(n, j) =∞

v=nβvβv+j =

∞

v=nα1(v)d1(v, j),

(26)

which implies (5.2) with k = 1.

We assume that (5.2) holds for k∈ N. By (2.26) and the Fubini–Tonelli theorem, we have d_2k+2(n, j) =∞ v2=0 ∞ v1=n β_v₂_+v₁β_j+v₁ d_2k(n, v₂), whence d_2k+2(n, j)− d_2k+2(n + 1, j) = I + II, where

I =∞ v2=0 ∞ v1=0βn+v2+v1βn+j+v1 [d_2k(n, v₂)− d_2k(n + 1, v₂)] , II =∞ v2=0βn+v2βn+jd2k(n + 1, v2).

Then, by (2.26), (5.2), and the Fubini–Tonelli theorem,

I =∞ v2=0 ∞ v1=0βn+v2+v1βn+j+v1 k l=1α2k−2l+1(n)d2l−1(n, v2) =k l=1α2k−2l+1(n) ∞ v1=0 β_n+j+v₁∞ v2=0 β_n+v₁_+v₂d_2l−1(n, v₂) =k l=1α2k−2l+1(n)d2l+1(n, j) = k+1 l=2 α2(k+1)−2l+1(n)d2l−1(n, j), while, by (5.1), we have II =∞ v2=0βn+v2d2k(n + 1, v2) β_n+j= α_2k+1(n)d₁(n, j). Thus we obtain (5.2) with k replaced by k + 1, as desired.

We derive two kinds of diﬀerence equations for b_k(n, j) from Proposition 5.1.

Proposition 5.2. For n, k∈ N and j ∈ N ∪ {0}, we have

b_2k+1(n, j)− b_2k+1(n + 1, j) = k l=1 α_2k−2l+1(n + 1)b_2l(n, j). (5.3)

Proof. From Theorem 2.7 and Proposition 5.1, we see that b_2k+1(n, j)− b_2k+1(n + 1, j) = c₀∞ u=0aj+u{d2k(n + 1, u)− d2k(n + 2, u)} = c₀∞ u=0aj+u k l=1α2k−2l+1(n + 1)d2l−1(n + 1, j) =k l=1α2k−2l+1(n + 1)c0 ∞ u=0aj+ud2l−1(n + 1, j) =k l=1α2k−2l+1(n + 1)b2l(n, j).

Thus the proposition follows.

Proposition 5.3. For n, k∈ N, and j = 1, . . . , n, we have

b_2k(n, n + 1− j) − b_2k(n + 1, n + 2− j) = k l=1 α_2k−2l+1(n + 1)b_2l−1(n, n + 1− j). (5.4)

(27)

Proof. Since d₁(n + 1, u) = α₁(n + 1 + u) and b₁(n, j) = c₀a_j, we see from Theorem 2.7 that b₂(n, n + 1− j) = c₀ ∞ u=0 a_n+1−j+ud₁(n + 1, u) = ∞ u=n+1 α₁(u)b₁(n, u− j). Similarly, we have b₂(n + 1, n + 2− j) = c₀ ∞ u=0 a_n+2−j+ud₁(n + 2, u) = ∞ u=n+2 α₁(u)b₁(n, u− j). Thus (5.4) holds for k = 1.

Suppose that k≥ 2. Then, from (2.26) and Theorem 2.7, b_2k(n, n + 1− j) = c₀ ∞ v=0 a_n+1−j+v ∞ u=0 β_n+1+v+ud_2(k−1)(n + 1, u), b_2k(n + 1, n + 2− j) = c₀ ∞ v=1 a_n+1−j+v ∞ u=0 β_n+1+v+ud_2(k−1)(n + 2, u). Hence b_2k(n, n + 1− j) − b_2k(n + 1, n + 2− j) = I + II with I = c₀ ∞ v=0 a_n+1+v−j ∞ u=0 β_n+1+v+ud_2(k−1)(n + 1, u)− d_2(k−1)(n + 2, u), II = c₀a_n+1−j ∞ u=0 β_n+1+ud_2(k−1)(n + 2, u). By (2.26) and Theorem 2.7, I is equal to

c₀ ∞ v=0 a_n+1−j+v ∞ u=0 β_n+1+v+u k−1 l=1 α_{2(k−1)−2l+1}(n + 1)d_2l−1(n + 1, v) = k−1 l=1 α_{2(k−1)−2l+1}(n + 1)c₀ ∞ v=0 a_n+1−j+v ∞ u=0 β_n+1+v+ud_2l−1(n + 1, u) =k−1 l=1 α2(k−1)−2l+1(n + 1)b2l+1(n, n + 1− j) =k l=2α2k−2l+1(n)b2l−1(n, n + 1− j),

while, by (5.1) and the equality b₁(n, n + 1− j) = ca_n+1−j, we have II = α_2k−1(n + 1)b₁(n, n + 1− j).

Thus we obtain (5.4), as desired.

Here is the representation of the PACF α(·) in terms of the MA and AR coeﬃ-cients.

Theorem 5.4. We assume either (A1) or (A2). Then α(n) =∞−_k=1α_2k−1(n) for n = 2, 3, . . . .

(28)

Proof. We have b₁(n, j)− b₁(n + 1, j) = c₀a_j− c₀a_j= 0 for n∈ N and j = 1, . . . , n. This, together with Theorem 2.7 and Propositions 5.2 and 5.3, yields

φ_n,j− φ_n+1,j =∞− k=1{gk(n, j)− gk(n + 1, j)} =∞− k=1{b2k+1(n, j)− b2k+1(n + 1, j) +b_2k(n, n + 1− j) − b_2k(n + 1, n + 2− j)} =∞− k=1 k l=1 α_2k−2l+1(n + 1){b_2l(n, j) + b_2l−1(n, n + 1− j)} =∞− l=1 {b2l(n, j) + b2l−1(n, n + 1− j)} ∞− k=lα2k−2l+1(n + 1) =∞− k=1α2k−1(n + 1) φ_n,n+1−j. Combining this and (1.4), we obtain the theorem.

5.2. Asymptotics for short memory processes. In this section, we apply The-orem 5.4 to the partial autocorrelation functions of a class of short memory pro-cesses.

Theorem 5.5. Let p > 1 and let {X_n} be a purely nondeterministic stationary

process. We assume (2.17) and

a_n= O(n−p) (n→ ∞). (5.5)

Then the partial autocorrelation function α(·) satisﬁes α(n) = O(n−p) (n→ ∞). (5.6)

Clearly, (5.5) implies (2.14), so that{X_n} in Theorem 5.5 satisﬁes (A1). Proof. By (5.5), there exists L₁∈ (0, ∞) such that

|an| ≤

L₁

np (n = 1, 2, . . . ).

Recall β_n from (2.22) and α_2k+1(·) from the previous section. It holds that |α1(n)| = |βn| ≤ ∞ v=0 |cv| L₁ (n + v)p ≤ L₂ np (n = 1, 2, . . . )

with L₂ := L₁∞_k=0|c_k|. Let A(·) and F (·) be as in (4.1). By (5.1) and Lemma 4.2, we have, for n, k∈ N, |α2k+1(n)| ≤ ∞ v=0 |βn+vd2k(n + 1, v)| ≤ L₂ npF (n + 1) 2k_.