• No results found

4.3 Learning problems and estimating mutual information

5.1.8 Comparison to practical algorithms

In this section, we compare our lower and upper bounds on sample complexity and the upper bounds on the probability of error as derived in the above sections. We consider setups with IID and correlated sensing matrix columns as described in Sections 5.1.1 and 5.1.2.

We define the parameters for the setup of (Aeron et al., 2010) as described in Section 5.1.1. For all experiments and evaluation of bounds, we set K = 32 and D = 512. The variables X and observations Y are generated according to the normalized model given by (5.1), where we choose S uniformly at random and let βS ∈ {−1, 1}K with uniform probability. Nn = N/(K log(D/K)) is the normalized

number of measurements.

We compare our bounds for independent and correlated sensing elements with Lasso (Candès and Plan, 2009; Wainwright, 2009b), as defined in (Candès and Plan, 2009). Formally, Lasso gives the solution to the following optimization problem:

β? = arg min β 1 2kY − Xβk 2 2 + λkβk1.

We set the regularization parameter as λ = 22 log D/√SNR as suggested in (Candès and Plan, 2009). We also investigated different values however we have not

We also investigate the performance of a non-convex iterative Lasso variant called the iteratively reweighted Lasso. This method is proposed in (Candès et al., 2008) for the noiseless recovery problem; we use an extension for the noisy case, which iteratively solves the following optimization problem at each step:

β(l) = arg min β 1 2kY − Xβk 2 2+ λr D X k=1 w(l)k |βk|, wk(l)= 1 β (l−1) k +  .

This optimization is the same as Lasso except for the individual weights wk(l) for each component βk, which depend on the output of the previous iteration.  is a suitably

small constant that stabilizes the weights for |βk| close to zero. Setting β(0) to the

solution of regular Lasso, the algorithm iterates until kβ(l)− β(l−1)k is smaller than a tolerance constant or a maximum number of iterations is reached.

Reweighted Lasso aims to sparsify the estimated β compared to regular Lasso. At each iteration, it places greater weight on small variables to sparsify the solution, while the influence of large variables is reduced in order to allow for more sensitivity in identifying the other variables. The authors in (Candès et al., 2008) intuitively justify the sparsifying properties of the algorithm by noting that iteratively solving the reweighted `1 problem is a Majorization-Minimization algorithm for the log-sum penalty problem, where the penalty is defined as PD

k=1log (|βk| + ). The sparsity

encouraging properties of this method can be intuitively justified by the fact that log (|βk| + ) approximates the `0 penalty much better than `1 does. It should be noted that the log-sum penalty is non-convex, therefore the iterative reweighted minimization is not guaranteed to converge to its global minimum. Furthermore, (Candès et al., 2008) notes that small values of  (leading to a better approximation of the `0 penalty) makes it more likely that the algorithm gets stuck at undesirable local minima.

We have chosen reweighted Lasso for comparison with the information-theoretic bound since for CS, the optimal ML decoder can be equivalently written as an `0

0 1 2 3 4 5 6 7 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Prob. of support recovery vs N

n for different SNR/log(D)

N n = N / Klog(D/K) Prob. of recovery 10 dB, Achiev. 10 dB, Lasso 10 dB, RWLasso 15 dB, Achiev. 15 dB, Lasso 15 dB, RWLasso 20 dB, Achiev. 20 dB, Lasso 20 dB, RWLasso

(a) Measurement cutoffs.

0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Prob. of support recovery vs correlation ρ for different SNR/log(D)

ρ Prob. of recovery 15 dB, Achiev. 15 dB, Lasso 15 dB, RWLasso 20 dB, Achiev. 20 dB, Lasso 20 dB, RWLasso (b) Correlation cutoffs.

Figure 5·4: Comparison of information-theoretic bound vs. Lasso and

reweighted Lasso.

constrained least squares minimization for fixed βS. Therefore we would expect a

method like the reweighted Lasso to better approach the achievable bound compared to Lasso, as it aims to successively approximate the `0 penalty while still being computationally efficient. We demonstrate that this is the case in our simulation results below.

Figure 5·4a plots the recovery bound for IID variables vs. Lasso and reweighted simulation performance, for different number of measurements N . The probabilities of recovery for the Lassos are computed over 40 iterations. Compared to Lasso, our IT bound has a much sharper transition, while also being tighter, matching closely our lower bound (vertical line for SNR/ log D = 20 dB) obtained with Theorem 5.1.1. Interestingly, reweighted Lasso nearly achieves our performance bounds for high SNR, however it fails in low SNR performance similar to Lasso. Note that the theoretical results in (Wainwright, 2009b; Wainwright, 2009a) for Lasso are not strictly comparable since they require a significantly large SNR regime. Furthermore, the performance gap approaches infinity as we let K approach D, implying Lasso works strictly in sublinear regime.

Figure 5·4b shows our probability of error bound vs. Lasso performances for different values of the correlation coefficient ρ, where Nn = 8. The probabilities of

recovery for Lassos are computed over 50 iterations. This plot demonstrates clearly that while our bounds show tolerance to correlation up to a constant approaching 1 (as we argued in Section 5.1.2), Lasso can tolerate at most ρ = 0.5 correlation for exact recovery in this scenario, with very high SNR and N . Note that strongest results due to (Candès and Plan, 2009) require correlations to decay asymptotically to zero as 1/ log(D). Reweighted Lasso shows better performance than Lasso, however there is still a significant gap between the achievable correlation bound and the reweighted Lasso performance, especially at 15 dB SNR.