Discussion on Computation Complexity - Adaptive Learning Algorithms for Non-stationary Data

This section analyzes the computation complexity of the proposed locKMM algorithm and the comparable methods. Following, the experimental running time with the tweets data is reported.

We first setup the computational complexity for several common modules. Let n be the number of samples and d be the dimensions. The k-NN algorithm has computational complexity ofO(nd+kn). The computation of kernel matrix has the complexity ofO(n2_d_).

The computation of matrix inverse has the complexity of O(n3). The convex quadratic programming problem has the complexity of O(n3_). _{We noticed that there are some}

variances which have improved efficiency [6]. The complexities presented here act as the worst boundaries.

Letnrf, nts, d be the number of samples in the reference collection, test collection, and

the dimension of the data, respectively. For the LOF algorithm, with k denoting the neighborhood size the complexity for testing a single data point is O(knrfd +k2nrfd).

Then, for all test points the complexity is O(ntsnrfd(k2+k)). As can be seen, if k takes

a large value the LOF method would be computationally expensive. This is due to the recursive searching ofk nearest neighbors in the LOF method.

The OSVM method includes a training phase and a test phase. In the training phase, the kernel matrix needs a computation ofO(n2

rfd). The QP problem has the complexity of

O(n3_rf). In the test phase, by denotingnsv as the number of support vectors, the complexity

to test all nts samples takes O(ntsnsvd). Taking the worst case that nsv ≈ nrf, the test

complexity isO(ntsnrfd). Then, the total computational complexity for OSVM is O(n3rf+

n2_rfd+ntsnrfd).

The uLSIF method also includes a model training phase and a test phase. Refer to Section 3.2.4 in Chapter 3, the computation of Hll0 needs O(b2n_tsd) and computation of

hl needs O(bnrfd), where b is the number of Gaussian bases in modeling the density-ratio

functions. The computation ofαˆ involves matrix inverse and has the complexity of O(b3₊

b2_{). Be aware that uLSIF has its model selection process. Assume there are}_s_{candidates for}

model selection, the total complexity in the training phase isO(s(b2ntsd+bnrfd+b3+b2)).

The complexity of test phase is O(nrfbd). The total computational complexity for uLSIF

isO(s(b2_n

tsd+bnrfd+b3+b2) +nrfbd).

For the KMM algorithm, the computation of the kernel matrixKxts,xts, Kxts,xrfisO(n

2 tsd+

ntsnrfd). The QP problem has the complexity of O(n3ts). Then, the total computational

Table 5.10: The computational complexity of different novelty detection algorithms.

Train Test Total

LOF - - O(ntsnrfd(k2+k)) OSVM O(n3_rf+n2_rfd) O(ntsnrfd) O(n3rf+n 2 rfd+ntsnrfd) uLSIF O(s(b2_n tsd+bnrfd+b3+b2)) O(nrfbd) O(s(b2ntsd+bnrfd+b3+b2) +nrfbd) KMM - - O(n3 ts+n2tsd+ntsnrfd) locKMM - - O(n3 ts+n2tsd+ntsnrfd+ (nrf+nts)d+k(nrf+nts))

In the proposed locKMM algorithm, there is an overhead for computing the local band- width σij, which is O((nrf+nts)d+k(nrf+nts)). The other components have the same

computation needs as KMM. Therefore, the total computational complexity for locKMM is O(n3_ts+n2_tsd+ntsnrfd+ (nrf+nts)d+k(nrf+nts)).

Table 5.10 summarizes the computational complexities of these five methods. The OSVM and uLSIF are model-based methods which include two phases, model training and the test, while LOF, KMM and locKMM are non-parametric methods, which do not produce explicit models. Assuming nrf = O(nts) = n, b = O(

√

n), k = O(√n), d << n, s << n, the uLSIF method has the lowest complexity asO(n2_{), the other four methods}

have the similar level of complexity asO(n3). For the four cascaded methods (LOF+KMM, LOF+uLSIF, OSVM+KMM, OSVM+uLSIF), their computational complexities are straight- forward, that is the sum of two corresponding algorithms.

Table5.11 reports experimental running time of the scenario ‘S1’ and ‘S2’ on the tweet data (Section5.5.2). The results are based on a commercial desktop using one core of its i-7 CPU. All methods are implemented in Matlab except the OSVM. It can be seen that these results are consistent with the above complexity analysis. The OSVM has the same level of computational complexity as KMM and locKMM. Because OSVM is based on a C++ implementation taken from the LibSVM library [21], the execution time for OSVM and the cascaded methods involved with OSVM are not included for comparison.

Table 5.11: Running time of the tweet data experiments (average of 10 runs, in seconds). Test Scenario S1 S2 LOF 14.8 110.6 uLSIF 3.6 11.6 KMM 27.4 172.3 locKMM 27.3 182.8 LOF+uLSIF 19.1 123.3 LOF+KMM 43.1 284.6

Chapter 6 Parameter Tuning for Kernel Mean

Matching

Although the work on estimating density ratios in a discriminative manner considerably enhances the performance of density-ratio estimators, the category of kernel mean matching methods depend on the choice of a kernel whose parameters are hard to estimate. In order to address this problem, we proposed an auto-tuning method for the kernel-based non-parametric density-ratio estimators by introducing a novel measure for assessing the quality of candidate choices.

The chapter is organized as follows. Section 6.1 introduces the problem of parameter selection in the kernel mean matching density-ratio estimators. Section 6.2 presents a parameter tuning method based on the new defined Normalized Mean Squared Error (NMSE) measure. Section6.3 presents an empirical evaluation. Lastly, the method is examined by extending it to polynomial kernels in Section6.4.

In document Adaptive Learning Algorithms for Non-stationary Data (Page 97-100)