Multiesolution Analysis review

4.5 Factor analysis

5.2.1 Multiesolution Analysis review

Let {ω_j,k = √

2^jω(2^jt − k)} for all j, k denote an orthogonal basis for L², where ω(t) spans V₀ (the reference subspace) and L² is the space of the functions with finite energy. Let us also consider the subspaces {0} ⊂ . . . ⊂ V₋₂ ⊂ V₋₁ ⊂ V₀ ⊂ V₁ ⊂ V₂ ⊂ . . . ⊂ L², and let W_j refer to the orthogonal complement of V_j+1 with respect to V_j, namely V_j+1 = W_j ⊕ V_j. In MRA, a given signal x(t) ∈ V₀ is decomposed into the sum of an approximation signal A_j(t) ∈ V_−j and a set of details Dj(t) ∈ W−j as follows [Mal89]:

x(t) = A₁(t) + D₁(t)

= A₂(t) + D₂(t) + D₁(t)

= A3(t) + D3(t) + D2(t) + D1(t)

= . . . (5.1)

where the A_i(t), D_j(t) are the projection of x(t) over the subspaces V_−i, W_−j respectively. Note that the A_i(t) signals are approximations of x(t) in a larger timescale, i.e., in the timescale 2ⁱ times the timescale of the original subspace V₀. The aim of MRA is to obtain an adequate approximation for the signal, namely to find the subspace V_iin which the original signal can be projected with minimum information loss. If the original timescale for V0 is 5 minutes, then the timescale

5.2. “Queueing equivalent” thresholding method 97

Figure 5.2: Wavelet filter banks. In each step the incoming signal goes through an analysis linear filter and the Approximation signal (A_i) and Detail signal (D_i) are obtained. On the right, the reverse process is shown: ˜x(t) is calculated applying the reconstruction filters to A_i (using null signals as D_i)

for V_i is 2ⁱ · 5 minutes. The approximations and details are obtained with the analysis filter banks. Basically, the signal goes through an analysis linear filter in order to obtain each successive approximation, as shown in Figure 5.2.

In summary, MRA provides a computationally efficient method for the approx-imation of a time-series in larger timescales. In what follows, we assume that the original time-series belongs to V₀, namely V₀ is the subspace that groups all the signals in the same timescale of the original time-series. This is only for the sake of notation simplicity. Note that the choice of the V₀ timescale is arbitrary.

5.2.2 “Queueing equivalent” analysis

The goal is to approximate x(t) by the largest timescale approximation A_i(t) such that the information loss, in terms of network behavior, is still acceptable.

For example, an approximation of x(t) in the subspace V1 consists of ˜x(t) = A₁(t)+0₁(t) where 0₁(t) is the zero of W₁. In V₂, this is ˜x(t) = A₂(t)+0₂(t)+0₁(t), etc.

As we approximate x(t) by its projection over the subspaces V₋₁, V₋₂, ... some information about the signal is lost, since the timescale is larger. In fact, wavelet shrinkage acts as an smoothing operator since it obtains a signal approximation with fewer points. More specifically, if x(t) is a signal of finite length with N points, then A₁(t) has N/2 points approximately, and A_j(t) is an approximation of x(t) with length N/2^j points. Our objective is to find the largest timescale approximation which is accurate enough for a given analysis.

Clearly, the appropriate timescale for a given approximation depends on the ap-plication of the MRA. For example, if we simply wish to detect an “average” value of the signal, then we may choose to approximate in a very large timescale. The timescale is usually selected by thresholding the energy of the details. However, this is a squared error criterion which is not specifically tailored to any network-related application of the MRA. Moreover, the energy threshold is a heuristic value.

In this thesis we propose and validate an approximation method which relates to queueing performance. Intuitively, a given signal x(t) and approximation ˜x(t) are said to be “queueing equivalent” if an infinite-buffer queue fed with both processes produces the “same” (or very similar) queueing occupancy distribution.

If this is the case, then we may take the approximation ˜x(t) instead of x(t) for whatever queueing-related analysis we wish to perform.

Concerning other applications, as mentioned in section 5.1, clustering and embedding applications may benefit from the fact that the time-series length is reduced after applying MRA. If, for instance, we wish to perform a r-dimensional clustering of x(t) with other traffic time-series, then we can take the approximation signal A_j(t) instead of x(t), since A_j(t) has fewer points. This makes the clustering algorithm converge faster.

More formally, let us consider an infinite-buffer single-server system which is governed by the Lindley’s equation [Lin52]:

Q(t + 1) = max{0, Q(t) + A(t) − C}, t = 0, 1, 2, . . . (5.2) where Q(t) is the system occupancy at time epoch t, A(t) are the bytes arriving during such time interval, and C is the router capacity. Let F_A denote the sys-tem occupancy distribution under traffic input A(t). The following provides the definition of “queueing equivalent” approximation:

Definition: The signal x(t) and the approximation A_j(t) are equivalent (in the queueing performance sense) for a utilization factor ρ and significance level α if and only if the null hypothesis of goodness-of-fit between F_A_j and F_x can be accepted at significance level α. Notation-wise, we say that x(t)Rρ,αAj(t).

Remark: Note that R_ρ,α is a binary relationship but not an equivalence rela-tionship in V0 × V0. Clearly, x(t)Rρ,αx(t) and if x(t)Rρ,αy(t) then y(t)Rρ,αx(t).

5.2. “Queueing equivalent” thresholding method 99

However, the transitive property does not hold. For example, let us consider the Kolmogorov-Smirnov statistic [DS86, Chapter 4] z(x(t), ˜x(t)) = max_{τ ≥0}|F_x(τ ) − F_x_˜(τ )|.

Then, for any y(t) and z(t) such that x(t) 6= y(t), y(t) 6= z(t) and x(t) 6= z(t),

z(x(t), z(t)) = max

τ ≥0 |F_x(τ ) − F_y(τ ) + F_y(τ ) − F_z(τ )|

≤ max

τ ≥0 |F_x(τ ) − F_y(τ )|

+ max

τ ≥0 |F_y(τ ) − F_z(τ )|

= z(x(t), y(t)) + z(y(t), z(t)) (5.3) and it cannot be assured that if z(x(t), y(t)) ∈ Sα and z(y(t), z(t)) ∈ Sα then z(x(t), z(t)) ∈ S_α for a given significance level α where S_α is the acceptance re-gion. As a consequence, x(t)R_ρ,αy(t) and y(t)R_ρ,αz(t) do not imply x(t)R_ρ,αz(t).

The same result applies to other goodness-of-fit tests such as the χ² test ([DS86, Chapter 3] and Section 4.2.5). 2.

In conclusion, the “queueing equivalent” thresholding method provides a tech-nique to decide whether a finite-length traffic time series in V₀, say x(t), and its approximation A_j(t) in the 2^j timescale (namely, A_j(t) ∈ V_−j) are equivalent in terms of queueing performance. This is the case if and only if ˜x(t) = A_j(t)+P_j

i=10_i and x(t) yield queueing occupancy distributions that pass the null hypothesis of a goodness-of-fit test for a given significance level and utilization factor.

Note that, in order to apply the method and obtain the queueing occupancy in the same timescale, one needs to reconstruct the original time-series in V₀ from A_j(t), by means of iterative application of the reconstruction filter j times (upsampling with null details), as depicted in Figure 5.2. However, this is only required to check whether approximation and original time-series are equivalent in the queueing performance sense. Once the original time-series and approximation are considered equivalent by the “queueing equivalent” method both can be used indistinguishable. However, the approximation is smaller in size and it is easier to store and process.

Concerning the computational complexity of this technique, we found that most computational cost is carried by the MRA decomposition function.

There-fore, the use of the “Queueing equivalent” method does not involve any substantial overload increase, with respect to other techniques such as the squared error com-putation.

In document Characterizing the spatial and temporal diversity of the Internet Traffic: a capacity planning application to the RedIRIS network (Page 124-128)