4-3 Estimation Performance for P

The goal of this section is to illustrate the performance of the KF-LASSO and the Weighted-SKF when we shift fromP > N > S to P > N ≯S. Hence, the number of available measurements per time,N, is varied. This enables us to illustrate the difference between performance of the KF-LASSO (originally proposed forN >1) and that of the Weighted-SKF (proposed for N = 1). Section 4-3-1 discusses the dataset that is used and Section 4-3-2 discusses the performance of the methods in estimatingSt and xt. 4-3-1 Dataset

Again, a dataset is generated with a sparse, time-varying coefficient vector xt and the

support is constant, S_t = S. However, the number of measurements per time step,

N, varies such that the performance of the methods is compared for situations where

P > N > S with situations whereP > N ≯S. Hence, both under- and overdetermined problems are generated and model (2-9) and (2-12) are used to generate the data. The entries in At are i.i.d. and the features fp and the yt-vector are zero-mean with unit-

variance. The parameters used to generate the dataset are listed in the box below. The nonzero coefficients in xtof Figure 4-1 are also used in this section.

Dataset Parameters

N ∈ {1,2,3,5,8,10,12,15,20,25,30}, P = 25, S ={1,2,3}

S = 3, T = 100, σ_sys2 = 0.01, σ2 = 0.01, Ct=βI,β= 0.99

For the various simulations, the optimal value ofλis chosen for eachN and the value is optimized for minimal prediction error. For the KF-LASSO this led toλ∈[0.001,0.1]. For the Weighted-SKFλ= 25 was chosen for each simulation. The choice and influence of tuning parameterλis discussed in more detail in Section 4-3-2. Moreover, to discuss the performance of the Weighted-SKF and the KF-LASSO, the results of the LASSO, KF and SKF are shown too. The complete set of parameter settings is listed in the box below. Tuning Parameters σ_init2 = 3, σ2_sys= 0.01, σ2 = 0.01, β = 0.99, α= 8 KF-LASSO: λ∈[0.001,0.1], αd= 0.1, k= 3, k0 = 3 Weighted-SKF: λ= 25, µ= 0.1, W = 10, Wd= 10 LASSO: λ∈[0.001,0.1] SKF: λ∈[0.01,1]

The simulation results with this dataset and with these parameter settings are discussed in Section 4-3-2

4-3-2 Discussion Simulation Results

The simulation results are split in two parts: first, the estimation accuracy of xt is

4-3 Estimation Performance forP > N _≯S 47

Accuracy for estimation of xt

The results are shown in Figure 4-4a. Each data point is the value of the M SE(xt)

calculated over 15 independent runs and averaged over the last 75 time steps. Only the last 75 time steps are taken into account such that no errors due to convergence are considered, since this would obscure the image of the actual behavior.

First, it is observed that for N ≥ 15 all methods have small errors, since this is the classical situation where P > N > S. Hence, when sparsity is introduced, an N ×S

problem is solved (instead of the N×P problem). This also explains why the error of the KF is larger for N ≥15 than the other methods, since this is the only method that does not introduce sparsity.

Moreover, when N is below 15, the performance of the LASSO deteriorates drastically. Again, this is caused by the fact that no dynamic information is taken into account and an independent problem is solved at every time step. WhenN decreases, less information per time step is available and hence the solutions to the independent problems get less accurate. This also explains why the performance of the KF-LASSO deteriorates for smaller N. Between 8≤N ≤15, the KF-LASSO outperforms the LASSO since it uses dynamic information in the KF which improves the accuracy of the estimates. However, for N ≤ 8 the performance of the KF-LASSO is worse than KF. This indicates that the wrong support is found by KF-LASSO and the KF part is run over this incorrect support, leading to large errors.

Furthermore, when the KF-LASSO is compared to the SKF it can be seen that the SKF is more accurate for lower N. This indicates that it is rewarding to implement sparsity directly into the KF equations, instead of running a separate LASSO as in the KF-LASSO.

Finally, Figure 4-4a shows that the Weighted-SKF outperforms all other methods and that it has the smallest error over the whole range of N. To illustrate the desired behaviour of the Weighted-SKF, the Genie Aided Kalman Filter (GA-KF) is shown as well. The GA-KF is the KF run over the true support, as if the support was known. It can be seen that the performance of the Weighted-SKF approximates the optimal GA-KF performance.

Accuracy for estimation of S_t

To further evaluate the performance of the methods, Figure 4-4b shows how many times the support was correctly estimated. For each time step a support needs to be estimated and this figure shows the percentage of - in total 75 - correct support estimations. First, it is observed that when N ≥15 for the LASSO, KF-LASSO and Weighted-SKF the support is correctly estimated at (nearly) all time instances. As expected, when N

decreases, the LASSO and the KF-LASSO loose their ability to correctly identify the support and hence their overall performance drops.

Furthermore, the accuracy of support estimation of the SKF is surprisingly low. Figure 4-4a, however, shows that the M SE(xt) is not increased for large values of N. This

48 Simulation Results 0 5 10 15 20 25 30 0 0.1 0.2 0.3 0.4

N, the number of measurements in At

M S E ( xt ) LASSO KF KF-LASSO SKF Weighted-SKF GA-KF

(a) Each data point is the M SE(xt)averaged over the last 75 time steps, P = 25, S = 3. For

N ≤ 15 the performance of the LASSO starts to deteriorate drastically. The other methods take dynamic information into account and retain smaller errors for low N. Weighted-SKF outperforms all methods and is nearly equal to GA-KF (KF over the given, true support).

0 5 10 15 20 25 30 0 20 40 60 80 100

N, the number of measurements in At

% Correct e sti mated supp ort LASSO KF-LASSO SKF Weighted SKF

(b) This plots shows the percentage of how many times (of in total 75) the support is estimated correctly. When N ≤ 15 and decreases, the number of times the support is correctly estimated decreases too.

Figure 4-4: (a) The averageM SE(xt)and (b) the percentage of how many times (of in total 75) the support is estimated correctly. Both as a function ofN.

4-3 Estimation Performance forP > N _≯S 49

not sufficiently applied. This results in poor performance of support estimation, without drastically increasing theM SE(xt). This is confirmed by the fact that the performance

of the SKF in terms ofM SE(xt) is close to the sparsity agnostic KF in Figure 4-4a.

Finally, the Weighted-SKF outperforms the other methods in terms of support estimation. Even for N = 1, the support is correctly estimated for nearly all time steps. This indicates (i) that the Weighted-SKF finds the true support and (ii) once the support is found, the IEN is a good measure to decide whether or not to re-estimate the support. The IEN prevents to unnecessarily re-estimate the support and thereby it prevents making errors in support estimation. Moreover, in contrast with the SKF, the Weighted-SKF applies shrinkage to the correct coefficients by using its weight vector

wt. The improvements of the Weighted-SKF over the SKF are discussed in more detail

next.

Weighted-SKF vs. SKF

Previously, Figure 4-4b illustrated that the performance of estimating the support with the SKF is limited. A straight-forward solution would be to increaseλin order to induce more sparsity since coefficients are not shrunk to zero with the current settings. However, this yields heavily biased nonzero coefficients as shown in Figure 4-5 that illustrates the shrinkage effect with the SKF . Figure 4-5a and 4-5b show a nonzero and zero coefficient, respectively, together with the KF estimates. It is seen that the estimates are already rather accurate, however, the zero coefficient is not exactly set to zero.

Therefore, the SKF is employed to induce sparsity. The SKF estimates are shown in Figure 4-5c and 4-5d with λ = 10. The results show the difficulty of tuning λ. λ is already high, however, the zero coefficient is only zero at a few time instances. The nonzero coefficient, on the other hand, is already heavily biased and increasing λ will only deteriorate this. This explains why the accuracy of support estimation is limited, as we observed in Figure 4-4b. The performance is thus inherently limited, regardless of the tuning of λ.

Figure 4-5e and 4-5f, on the other hand, show that an accurate solution with the zero coefficient shrunk exactly to zero is found with the Weighted-SKF. As explained in Section 3-5-2, the Weighted-SKF makes use of the fact that the mean of the KF estimates is close to the mean of the true coefficient in formingwt. It is seen thatwtensures that

shrinkage is only applied to the zero coefficients and the true support is thus identified. Moreover, the results show that this also results in more accurate estimates for the nonzero coefficients.

50 Simulation Results 0 20 40 60 80 100 0.5 1 1.5 2 time

(a)Nonzero coefficientxt,1and estimatexˆKFt,1 .

0 20 40 60 80 100 −2 0 2 4 6 ·10 −2 time xt ˆ xKF_t Mean of ˆxKF_t

(b) Zero coefficientxt,2 and estimatexˆKFt,2 .

0 20 40 60 80 100

0 1 2

time

(c)Nonzero coefficientxt,1 and estimatexˆSKFt,1 .

0 20 40 60 80 100 −0.1 0 0.1 0.2 time xt ˆ xSKF_t Mean of ˆxSKF_t

(d) Zero coefficientxt,2 and estimatexˆSKFt,2 .

0 20 40 60 80 100 0.5 1 1.5 2 time

(e)Nonzero coefficientxt,1and estimatexˆW SKFt,1 .

0 20 40 60 80 100 −1 −0.5 0 0.5 1 ·10 −3 time xt ˆ xW SKF_t Mean of ˆxW SKF_t

(f )Zero coefficientxt,2and estimatexˆW SKFt,2 .

Figure 4-5: Nonzero coefficient and its estimate by (a) KF, (c) SKF and (e) Weighted- SKF. Zero coefficient and its estimates by (b) KF, (d) SKF and (f) Weighted-SKF. The SKF nonzero coefficient is heavily biased, while this is compensated in the Weighted-SKF.

In document Social Media Aided Stock Market Predictions by Sparsity Induced Regression (Page 64-69)

4-3 Estimation Performance for P > N ≯ S