Performance Results - Threshold-Based Query Algorithms

4.5 Threshold-Based Query Algorithms

4.6.2 Performance Results

We compared the eciency of our proposed approach to a number of competing techniques. In the following we will denote our approach for answering threshold queries by `RP ar'.

The rst competing approach works on native time series. At query time the threshold-crossing time intervals are computed for the query threshold and afterwards the distance between the query time series and each database object can be derived. In the following this method will be denoted by `SeqN at' as it corresponds to a sequential processing of the native data.

The second competitor works on the parameter space rather than on the native data. It assumes all time series objects have already been mapped to the parameter space. However, no index structure is used. As this storage leads to a sequential scan over the elements of the parameter space we will refer to this technique as the `SeqP ar' method.

Furthermore we included a number of traditional similarity measures based on the following dimensionality reduction methods: Chebyshev Poly- nomials (Cheb) [CN04], Discrete Fourier Transformation (DFT ) [AFS93], and Fast Map (FM ) [FL95]. In particular, we implemented the algorithm

To obtain more reliable and signicant results, in the following experi- ments we used 5 randomly chosen query objects. Furthermore, these query objects were used in conjunction with 5 dierent thresholds, so that we ob- tained 25 dierent threshold queries. The presented results are the average results of these queries.

We used the audio dataset and varied its size as well as the length of the time series. For the rst experiment we varied the database size and set the length of the time series to a xed value of 50 time slots. In Figure 4.16(a) the results of our approach compared to SeqN at and SeqP ar are given. The

performance of SeqN at and SeqP ar decreases, while our approach can handle

large amounts of data very well. The next experiment (cf. Figure 4.16(b)) compares our approach to the dimensionality reduction methods listed above. Although the scalability behavior of our approach is similar to that of the dimensionality reduction techniques, the absolute performance value of our approach is signicantly better than that of the dimensionality reduction methods.

The next experiment explores the impact of the length of the query object and the time series in the database. The results are shown in Figure 4.17. Again, our technique outperforms SeqN at and SeqP ar (cf. Figure 4.17(a)),

whose cost increase very fast due to the expensive distance computations. In contrast, our approach, like DFT and FM, scales well for larger time series objects. For small time series it even outperforms by far the three dimensionality reduction approaches as shown in Figure 4.17(b). If the length of the time series objects exceeds 200, then DFT and FM scale better then our approach. In contrast, Cheb scales relatively bad for larger time series. The reason is that the number of required Chebyshev coecients has to be increased with the time series length for constant approximation quality.

4.6 Experimental Evaluation 75 0 10 20 30 40 50 60 70 80 90 100000 200000 300000 400000 500000 600000 700000 R-Par Seq-Par Seq-Nat

Elapsed time [sec]

Number of Objects in the Database

(a) 0 1 2 3 4 5 6 7 8 9 100000 200000 300000 400000 500000 600000 Cheb DFT FM R-Par

Number of Objects in the Database

Elapsed time [sec]

(b)

Figure 4.16: Scalability of the threshold-query algorithm with respect to the database size.

Obviously, the cardinality of our time series representations increases linear with the time series length.

In the next experiment, we analyzed the speed-up of the query process caused by our pruning strategy. We measured the number of result candidates considered in the lter step of our query algorithm, denoted by 'Fil- ter', and the number of objects which have to be rened nally, denoted by 'Renement'. Again,we compare our approach to the three dimensionality reduction methods Cheb, DFT, and FM. Figure 4.18(a) and Figure 4.18(b) show the results relatively to the database size and length of the time se-

0 50

50 100 150 200 250 300

Elapsed time [sec]

Length of Time Series in Database

(a) 0 5 10 15 20 25 30 35 40 45 50 100 150 200 250 300 Cheb DFT FM R-Par

Length of Time Series in Database

Elapsed time [sec]

(b)

Figure 4.17: Scalability of the threshold-query algorithm with respect to the length of the time series.

ries objects. Generally, only a very small portion of the candidates has to be rened to calculate the result. Similar to the dimensionality reduction methods, our approach scales well for large databases. For small time series, our approach has a lightly better pruning power then Cheb and FM. We can observe that the pruning power of our approach decreases with increasing time series length. An interesting observation is that the number of candidates that have to be accessed in the lter step increases faster with larger time series than the number of nally rened candidates. Yet, for the audio dataset the DFT method shows the best results in terms of pruning power.

4.6 Experimental Evaluation 77 0,0001 0,001 0,01 0,1 1 100000 200000 300000 400000 500000 600000 Cheb DFT FM Filter Refinement

Relative number of objects [%]

Number of Objects in the Database

(a) Pruning power for varying database size.

0,001 0,01 0,1 1 10 50 100 150 200 250 300 Cheb DFT FM Filter Refinement

Length of Time Series in Database

Relative number of objects [%]

(b) Pruning power for varying time series length.

Figure 4.18: Pruning Power of the threshold-based nearest-neighbor algorithm.

Furthermore, we examined the number of nearest-neighbor search iterations that were required for the query process for varying length of the time series and varying size of the database. We observed that the number of iterations was between 5 and 62. The number of iterations increases linear to the length of the time series and remains nearly constant with respect to the database size. Nevertheless, only a few iterations are required to report the result.

So, in terms of performance, our approach signicantly outperformsSeqN at

andSeqP ar. It is furthermore comparable to the above mentioned dimension-

ality reduction techniques Cheb, DFT, and FM. For time series of small and medium length our approach even outperforms these dimensionality reduc-

0 0,2

cla

sification accuracy [%]

Trace SynCtrl GunX CBF

Figure 4.19: Comparison to Traditional Distance Measures. tion techniques.

In document Aßfalg, Johannes (2008): Advanced Analysis on Temporal Data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 91-96)