• No results found

4.5 Factor analysis

5.2.1 Multiesolution Analysis review

Let {ωj,k =

2jω(2jt − k)} for all j, k denote an orthogonal basis for L2, where ω(t) spans V0 (the reference subspace) and L2 is the space of the functions with finite energy. Let us also consider the subspaces {0} ⊂ . . . ⊂ V−2 ⊂ V−1 ⊂ V0 V1 ⊂ V2 ⊂ . . . ⊂ L2, and let Wj refer to the orthogonal complement of Vj+1 with respect to Vj, namely Vj+1 = Wj ⊕ Vj. In MRA, a given signal x(t) ∈ V0 is decomposed into the sum of an approximation signal Aj(t) ∈ V−j and a set of details Dj(t) ∈ W−j as follows [Mal89]:

x(t) = A1(t) + D1(t)

= A2(t) + D2(t) + D1(t)

= A3(t) + D3(t) + D2(t) + D1(t)

= . . . (5.1)

where the Ai(t), Dj(t) are the projection of x(t) over the subspaces V−i, W−j respectively. Note that the Ai(t) signals are approximations of x(t) in a larger timescale, i.e., in the timescale 2i times the timescale of the original subspace V0. The aim of MRA is to obtain an adequate approximation for the signal, namely to find the subspace Viin which the original signal can be projected with minimum information loss. If the original timescale for V0 is 5 minutes, then the timescale

5.2. “Queueing equivalent” thresholding method 97

Figure 5.2: Wavelet filter banks. In each step the incoming signal goes through an analysis linear filter and the Approximation signal (Ai) and Detail signal (Di) are obtained. On the right, the reverse process is shown: ˜x(t) is calculated applying the reconstruction filters to Ai (using null signals as Di)

for Vi is 2i · 5 minutes. The approximations and details are obtained with the analysis filter banks. Basically, the signal goes through an analysis linear filter in order to obtain each successive approximation, as shown in Figure 5.2.

In summary, MRA provides a computationally efficient method for the approx-imation of a time-series in larger timescales. In what follows, we assume that the original time-series belongs to V0, namely V0 is the subspace that groups all the signals in the same timescale of the original time-series. This is only for the sake of notation simplicity. Note that the choice of the V0 timescale is arbitrary.

5.2.2 “Queueing equivalent” analysis

The goal is to approximate x(t) by the largest timescale approximation Ai(t) such that the information loss, in terms of network behavior, is still acceptable.

For example, an approximation of x(t) in the subspace V1 consists of ˜x(t) = A1(t)+01(t) where 01(t) is the zero of W1. In V2, this is ˜x(t) = A2(t)+02(t)+01(t), etc.

As we approximate x(t) by its projection over the subspaces V−1, V−2, ... some information about the signal is lost, since the timescale is larger. In fact, wavelet shrinkage acts as an smoothing operator since it obtains a signal approximation with fewer points. More specifically, if x(t) is a signal of finite length with N points, then A1(t) has N/2 points approximately, and Aj(t) is an approximation of x(t) with length N/2j points. Our objective is to find the largest timescale approximation which is accurate enough for a given analysis.

Clearly, the appropriate timescale for a given approximation depends on the ap-plication of the MRA. For example, if we simply wish to detect an “average” value of the signal, then we may choose to approximate in a very large timescale. The timescale is usually selected by thresholding the energy of the details. However, this is a squared error criterion which is not specifically tailored to any network-related application of the MRA. Moreover, the energy threshold is a heuristic value.

In this thesis we propose and validate an approximation method which relates to queueing performance. Intuitively, a given signal x(t) and approximation ˜x(t) are said to be “queueing equivalent” if an infinite-buffer queue fed with both processes produces the “same” (or very similar) queueing occupancy distribution.

If this is the case, then we may take the approximation ˜x(t) instead of x(t) for whatever queueing-related analysis we wish to perform.

Concerning other applications, as mentioned in section 5.1, clustering and embedding applications may benefit from the fact that the time-series length is reduced after applying MRA. If, for instance, we wish to perform a r-dimensional clustering of x(t) with other traffic time-series, then we can take the approximation signal Aj(t) instead of x(t), since Aj(t) has fewer points. This makes the clustering algorithm converge faster.

More formally, let us consider an infinite-buffer single-server system which is governed by the Lindley’s equation [Lin52]:

Q(t + 1) = max{0, Q(t) + A(t) − C}, t = 0, 1, 2, . . . (5.2) where Q(t) is the system occupancy at time epoch t, A(t) are the bytes arriving during such time interval, and C is the router capacity. Let FA denote the sys-tem occupancy distribution under traffic input A(t). The following provides the definition of “queueing equivalent” approximation:

Definition: The signal x(t) and the approximation Aj(t) are equivalent (in the queueing performance sense) for a utilization factor ρ and significance level α if and only if the null hypothesis of goodness-of-fit between FAj and Fx can be accepted at significance level α. Notation-wise, we say that x(t)Rρ,αAj(t).

Remark: Note that Rρ,α is a binary relationship but not an equivalence rela-tionship in V0 × V0. Clearly, x(t)Rρ,αx(t) and if x(t)Rρ,αy(t) then y(t)Rρ,αx(t).

5.2. “Queueing equivalent” thresholding method 99

However, the transitive property does not hold. For example, let us consider the Kolmogorov-Smirnov statistic [DS86, Chapter 4] z(x(t), ˜x(t)) = maxτ ≥0|Fx(τ ) − Fx˜(τ )|.

Then, for any y(t) and z(t) such that x(t) 6= y(t), y(t) 6= z(t) and x(t) 6= z(t),

z(x(t), z(t)) = max

τ ≥0 |Fx(τ ) − Fy(τ ) + Fy(τ ) − Fz(τ )|

≤ max

τ ≥0 |Fx(τ ) − Fy(τ )|

+ max

τ ≥0 |Fy(τ ) − Fz(τ )|

= z(x(t), y(t)) + z(y(t), z(t)) (5.3) and it cannot be assured that if z(x(t), y(t)) ∈ Sα and z(y(t), z(t)) ∈ Sα then z(x(t), z(t)) ∈ Sα for a given significance level α where Sα is the acceptance re-gion. As a consequence, x(t)Rρ,αy(t) and y(t)Rρ,αz(t) do not imply x(t)Rρ,αz(t).

The same result applies to other goodness-of-fit tests such as the χ2 test ([DS86, Chapter 3] and Section 4.2.5). 2.

In conclusion, the “queueing equivalent” thresholding method provides a tech-nique to decide whether a finite-length traffic time series in V0, say x(t), and its approximation Aj(t) in the 2j timescale (namely, Aj(t) ∈ V−j) are equivalent in terms of queueing performance. This is the case if and only if ˜x(t) = Aj(t)+Pj

i=10i and x(t) yield queueing occupancy distributions that pass the null hypothesis of a goodness-of-fit test for a given significance level and utilization factor.

Note that, in order to apply the method and obtain the queueing occupancy in the same timescale, one needs to reconstruct the original time-series in V0 from Aj(t), by means of iterative application of the reconstruction filter j times (upsampling with null details), as depicted in Figure 5.2. However, this is only required to check whether approximation and original time-series are equivalent in the queueing performance sense. Once the original time-series and approximation are considered equivalent by the “queueing equivalent” method both can be used indistinguishable. However, the approximation is smaller in size and it is easier to store and process.

Concerning the computational complexity of this technique, we found that most computational cost is carried by the MRA decomposition function.

There-fore, the use of the “Queueing equivalent” method does not involve any substantial overload increase, with respect to other techniques such as the squared error com-putation.