Performance Results - Kunath, Peter (2006): Efficient Analysis in Multimedia Databases.

We compared the efficiency of our proposed approach, in the following denoted by ‘RP ar’, for answering threshold queries, using one of the following

techniques.

The first competing approach, denoted by ‘SeqN at’, works on the native

13.3 Performance Results 147

The threshold-crossing time intervals of each time series was computed at query time.

The second competitor, denoted by ’SeqP ar’, works on the parameter

space rather than on the native data. It assumes that all time series objects are already transformed into the parameter space, but without using any index structure. At query time, this method requires a sequential scan over all segments of the parameter space.

We further compare the performance of our approach to traditional similarity search approaches based on the following dimension reduction methods: Chebyshev Polynomials (Cheb) [CN04], Discrete Fourier Transformation (DFT) [AFS93] and Fast Map (FM) [FL95]. In particular, we implemented the algorithm proposed by Seidl and Kriegel in [SHP98] which adapts the GEMINI framework (cf. Section 2.3) for k-nearest-neighbor search. Since the applied dimensionality reduction techniques approximate the Euclidean space, they can only be used to accelerate similarity queries based on the Euclidean distance. They cannot be applied to threshold-based similarity search applications.

To obtain more reliable and significant results, in the following experi- ments we used 5 randomly chosen query objects. Furthermore, these query objects were used in conjunction with 5 different thresholds, so that we ob- tained 25 different threshold-based nearest-neighbor queries. The presented results are the average results of these queries.

First, we performed threshold queries against database instances of different sizes to measure the influence of the database size to the overall query time. The elements of the databases are time series of fixed length l = 50. Figure 13.1 exhibits the performance results for each database. In Figure

13.1(a) it is shown that the performance of both approaches SeqN at and

SeqP ar significantly decreases with increasing database size, whereas our ap-

proach RP ar scales very well, even for large databases. Furthermore, our

approach shows similar scalability behavior than the three dimensionality reduction approaches Cheb, DFT and FM as depicted in Figure 13.1(b). Yet, our approach even outperforms them by a factor of 4 to 5.

0 10 20 30 40 50 60 70 80 90 100000 200000 300000 400000 500000 600000 700000 R-Par Seq-Par Seq-Nat

Elapsed time [sec]

Number of Objects in the Database

(a) 0 1 2 3 4 5 6 7 8 9 100000 200000 300000 400000 500000 600000 Cheb DFT FM R-Par

Number of Objects in the Database

Elapsed time [sec]

(b)

Figure 13.1: Scalability of the threshold-query algorithm against database size.

Second, we explored the impact of the length of the query object and the time series in the database. The results are shown in Figure13.2. Again, our technique outperforms the competing approaches SeqN at and SeqP ar whose

cost increase very quickly due to the expensive distance computations (cf. Figure 13.2(a)). In contrast, our approach, like DFT and FM, scales well for larger time series objects. For small time series it even outperforms by far the three dimensionality reduction approaches as shown in Figure

13.2(b). If the length of the time series objects exceeds 200, then both approaches DFT andFM scale better than our approach. In contrast, Cheb scales relatively bad for larger time series. The reason is that the number of required Chebyshev coefficients has to be increased with the time series

13.3 Performance Results 149 0 50 100 150 200 250 50 100 150 200 250 300 R-Par Seq-Par Seq-Nat

Elapsed time [sec]

Length of Time Series in Database

(a) 0 5 10 15 20 25 30 35 40 45 50 100 150 200 250 300 Cheb DFT FM R-Par

Length of Time Series in Database

Elapsed time [sec]

(b)

Figure 13.2: Scalability of thethreshold-queryalgorithm against time series length.

length for constant approximation quality. Obviously, the cardinality of our time series representations increases linear with the time series length.

In the next experiment, we demonstrate the speed-up of the query process caused by our pruning strategy. We measured the number of result candidates considered in the filter step of our query algorithm, denoted by ’Filter’, and the number of objects which has to be refined finally, denoted by ’Refinement’. We will again compare our approach to the three dimensionality reduction methods Cheb,DFT and FM. Figure 13.3(a)and Figure

13.3(b) show the results relatively to the database size and length of the time series objects. Generally, only a very small portion of the candidates has to be refined to report the result. Similar to the dimension reduction

0,0001 0,001 0,01 0,1 1 100000 200000 300000 400000 500000 600000 Cheb DFT FM Filter Refinement

Relative number of objects [%]

Number of Objects in the Database

(a) Pruning power for varying database size.

0,001 0,01 0,1 1 10 50 100 150 200 250 300 Cheb DFT FM Filter Refinement

Length of Time Series in Database

Relative number of objects [%]

(b) Pruning power for varying time series length.

Figure 13.3: Pruning Power of the threshold-based nearest-neighbor algorithm.

methods, our approach scales well for large databases. For small time series, our approach has a lightly better pruning power thanCheb andFM. We can observe that the pruning power of our approach decreases with increasing time series length. An interesting point is that the number of candidates to be accessed in the filter step increases faster for larger time series than the number of finally refined candidates. Yet, for the AUDIO dataset the DFT method shows the best results w.r.t. the pruning power.

Furthermore, we examined the number of nearest-neighbor search iterations of the query process for varying length of the time series and varying size of the database. We observed that the number of iterations was between 5 and 62. The number of iterations increases linear to the length of the time series and remains nearly constant w.r.t. the database size. Nevertheless,

In document Kunath, Peter (2006): Efficient Analysis in Multimedia Databases. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 162-167)