Using Wavelets for Approximation - Clustering Effort Data

5.2 Clustering Effort Data

5.2.4 Using Wavelets for Approximation

Both PAM and CLARA require that we store the entire data set in memory on the node running the clustering algorithm. PAM requiresO(n2₎_{space, and CLARA requires}_O₍_n₊_s2₎

. . . . . . . . . si di di si di di si di di si di di 1 1 1 1 1 1 2 2 2 2 2 2 sLi siL

(a) LevelLtransform

s_i_d_i d_is_i d_id_i s_i_d_i d_is_i d_id_i 1 1 1 1 1 1 2 2 2 2 2 2 d_i3s_i3 si3 di3 di 3di3 s_i3s_i3 (b) Level 3 approximation s_i_d_i di si di di s_i di di si di di 1 1 1 1 1 1 2 2 2 2 2 2 s_i2s_i2 (c) Level 2 approximation s_i di di si di 1di1 1 1 1 1 si 1si1 (d) Level 1 approximation s_i0s_i0

(e) Full reconstruction

Figure 5.3: Structure of coefficients after applications of the inverse wavelet transform

space. Neither approach is scalable, because n for our problem would be the number of processes in the full parallel system.

The wavelet representation provides an effective solution to this problem. Recall from

§3.3 that we can analyze wavelet-compressed data at hierarchical scales and levels of detail. We can use this property of the wavelet transform to generate a coarse-grained approximation of our per-process effort signatures. Thus, we can cluster on a much smaller representation of the same data instead of on the uncompressed data set.

Figure 5.3a shows the organization of two-dimensional wavelet coefficients for a level-L transform. sl_i anddl_i represent low-frequency and high-frequency coefficients, respectively, for the level-lapplication of the wavelet transform. Coefficients at level 1 represent information from the input data with the highest frequency and finest granularity, and coefficients at successively deeper levels represent coarser-grained information. Level-Lcoefficients repre-

sent the coarsest features. Each levellcontains one-fourth as many coefficients as levell−1, so the bulk of the space is consumed by high-frequency coefficients.

The low-frequency coefficientssl_iat deeper levels of the wavelet transform are a smaller, coarser-grained approximation of the input data, and the high-frequency coefficientsdl_irepre- sent the details that were removed to create these approximations. Decompressing an effort file requires that we first EZW-decode the transformed data, then add each level of detail back into the image by applying successive inverse wavelet transforms. Typically, the inverse transform is appliedLtimes to reconstruct the input datas0

i. This process is shown in

Figure 5.3.

We can also create a level L − i approximation of the input data by applying only i inverse transforms. Figures 5.3b, 5.3c, 5.3d, and 5.3e show the structure of coefficients after inverse transforms are applied to obtain levels 3, 2, 1, and 0, respectively. The coefficients that comprise the approximations are highlighted in bold.

We can now EZW-decode compressed effort data and construct an approximation at an arbitrary level, but one complication remains. The EZW encoding algorithm, which con- tributes greatly to the effectiveness of our compression algorithm, operates on thefulldata for each compressed region. Each EZW pass measures the significance of all wavelet coefficients against a particular threshold. Thus, each pass requires a full traversal of the wavelet coefficients.

The original EZW algorithm (Shapiro, 1993), used a Morton scan (Morton, 1966) for this purpose, while the parallelized version uses a depth-first traversal of the coefficients (Ang et al., 1999). In both cases, data from the same level in the transformed coefficients is non- contiguous. Because the EZW encoding is embedded, we cannot simply jump to the level of interest in an EZW stream. With an unmodified EZW algorithm, we would be forced to create a matrix large enough to holdall compressed wavelet coefficients for each effort region that we explore.

(a) Morton scan. (b) Depth-first traversal.

Figure 5.4: Modified EZW traversals for generating approximation matrices.

To make the memory consumption of our scheme practical, we have modified the EZW algorithm to decode only data that we need for the desired approximation level. We first allocate a matrix large enough to hold all coefficients that we plan to decode. We then run the EZW decoder as before, but the symbols in the input stream describe a larger matrix than we have allocated. We simply ignore symbols in the input stream that would produce coefficients outside the approximation matrix. We accomplish this as follows. On each pass, we do an implicit traversal of the destination matrix as we decode the input stream.

Figure 5.4 shows two possible traversals for a level-2 approximation. In Figure 5.4a, we show the algorithm for the more traditional Morton scan. Our parallel wavelet compression algorithm uses a depth-first traversal because subtrees of a depth-first traversal coincide with processes’ local data in the parallel transform. We show the depth-first traversal of the wavelet coefficients in Figure 5.4b.

In both figures, the frequency bands of the full matrix are outlined, with the portion to be decoded into the approximation matrix highlighted in bold. In both cases, we begin reading

the EZW stream and we only do the update if it falls within the bounds of the approximation matrix when we need to write a coefficient to the matrix. Because the EZW encoding is embedded, we must still do a full traversal and read all symbols on the input stream. In the figure, parts of the traversal that write coefficients are shown in blue, while parts that only read from the input stream are shown in red.

Although we show full traversals in the figure, the EZW algorithm encodes a special

zerotree coefficient that allows us to skip insignificant subtrees on each pass. Thus, while

the exhausted data isO(n)in the number of processes in the monitored system, the decoding process is bounded by the size of the compressed data. We showed in §3.5.2 that we achieve from 100:1 to 1000:1 compression ratios; thus decoding on a single node remains manageable, even on large systems.

In document Scalable performance measurement and analysis (Page 156-160)