Optimization Performance - Simulation Studies

3.3 Simulation Studies

3.3.3 Optimization Performance

The three optimization methods were run on the same data sets, so each method’s sharpening distances can be directly compared. Figure 3.8 shows the objective func-

3_{The sample size effect could be caused by greater inherent difficulty in the}_n_{-dimensional opti-} mization, or by the presence of more distant outliers in the largert3 samples.

Table 3.1: Convergence and run time results.

Proportion converging Median run time (s) Density Bandwidth n Greedy SQP Combined Greedy SQP Combined

t3 0.75hSJ 25 1 0.956 1 0.061 9.6 2.0 t3 0.75hSJ 50 1 0.880 1 0.13 19 4.8 t3 0.75hSJ 100 1 0.844 0.992 0.31 41 16 t3 hSJ 25 1 0.964 1 0.044 3.9 2.0 t3 hSJ 50 1 0.936 0.996 0.092 15 4.7 t3 hSJ 100 1 0.916 0.996 0.21 38 15 mixture 0.75hSJ 25 1 1 1 0.12 4.4 3.6 mixture 0.75hSJ 50 1 1 1 0.28 9.7 8.4 mixture 0.75hSJ 100 1 0.992 1 0.74 31 28 mixture hSJ 25 1 1 1 0.060 3.3 3.1 mixture hSJ 50 1 1 1 0.14 8.2 7.7 mixture hSJ 100 1 1 1 0.38 28 26

tion values for the greedy and SQP methods plotted against each other, for the 2872 pairs of optimizations where SQP converged. Cases based on each of the two target distributions are plotted with different markers. The 1:1 line is also shown on the plot. Points below the line represent runs where the greedy method outperformed SQP, and points above the line represent runs where SQP found the better solution. The figure suggests that the greedy method had good relative performance. While most of the runs had similar results for the two methods, there were also a large num- ber of runs where the SQP objective value greatly exceeded the greedy value. These are runs where SQP stopped at a particularly poor local minimum. Interestingly, most of these poor SQP results arose in problems based on the mixture distribution, where convergence was not a problem. There were also some data sets where SQP greatly outperformed improve, but such cases were much less frequent.

To facilitate a more detailed comparison, objective function values for the greedy and combined methods were normalized relative to the SQP sharpening distance for the same sample and bandwidth. The normalized sharpening distance is the ratio of that method’s L1 distance to the SQP value. Figure 3.9 shows box plots of the normalized sharpening distance for both the greedy and combined methods, for all 12 simulation cases. Boxes show locations of the first, second, and third quartiles, and whiskers extend to the most extreme values differing from the median by less than

0 50 100 150 200 250 300 0 5 10 15 20 25 30 35 SQP L₁ distance G re e d y L 1 d ist a n ce

Figure 3.8: Scatter plot of sharpening distances across all simulation runs. Circles and crosses denote cases based on the t3 and mixture distributions, respectively.

1.5 times the interquartile range.

Normalized objective function values less than 1 indicate performance better than SQP. The boxplots for the greedy method show that it strongly outperformed SQP on thet3 problems, and was roughly equivalent to SQP on the mixture problems. For thet3 cases, all but one of the cases have their third quartiles less than one, indicating that the greedy result was better than the SQP result more than 75% of the time. The improvement over SQP is also more pronounced for larger sample sizes. In the mixture cases, SQP outperformed greedy when the bandwidth was smaller, while neither method was clearly superior for the larger bandwidth.

Looking at the combined-method cases in Figure 3.9, it is clear that the combined method performed better than the default SQP withxas its starting point. Starting at the greedy optimum had a pronounced effect for the more difficult t3 cases, but only a negligible effect on the mixture cases. Note that using the greedy starting point does not always improve the performance of SQP. The best starting point for SQP is sample-dependent and one could not expect any rule to provide the best start for all cases.

mixture h 100 mixture h 50 mixture h 25 mixture 0.75h 100 mixture 0.75h 50 mixture 0.75h 25 t3 h 100 t3 h 50 t3 h 25 t3 0.75h 100 t3 0.75h 50 t3 0.75h 25 _Greedy 0 0.5 1 1.5 2 mixture h 100 mixture h 50 mixture h 25 mixture 0.75h 100 mixture 0.75h 50 mixture 0.75h 25t3 h 100 t3 h 50 t3 h 25 t3 0.75h 100 t3 0.75h 50 t3 0.75h 25 Normalized L 1 Distance Combined

Figure 3.9: Box plots of normalizedL1 sharpening distance, for both the greedy and combined search methods. The labels at left give the simulation case.

As a further illustration of the performance of the methods, Figure 3.10 gives plots of the density estimates for nine randomly-selected simulation data sets. Each plot gives the unsharpened density as well as the sharpened density based on both the SQP and greedy search. Plots 1–3 show cases where SQP found the better result, and plots 4–9 show cases where the greedy algorithm found the better result. These examples were sampled from only those cases where the relative difference in sharpening distance was large (the worse method’s sharpening distance being at least 50% larger than the better method’s).

The examples show that for cases when the unsharpened estimate is nearly uni- modal (as in plots 1, 3, 6, and 9), there is little qualitative difference between the greedy and SQP solutions despite the large relative difference in L1(y,x). When the original estimate does have outliers or other large deviations from unimodality, the differences in the estimate are more pronounced, and typically the SQP estimate is inferior (as in plots 4, 5, 7, and 8). The greedy estimate matches the unsharpened curve exactly at points away from the unwanted modes, while the SQP estimate may

1 SQP 2 SQP 3 SQP

4 Greedy 5 Greedy 6 Greedy

7 Greedy 8 Greedy 9 Greedy

Figure 3.10: Comparing the unsharpened estimate (thick grey line), greedy estimate (thin solid line), and SQP estimate (dashed line) for nine simulation data sets. Plot labels in the upper right indicate which method had a smallerL1sharpening distance.

be poor everywhere if the algorithm converges to a low-quality local optimum. Plot 2 in Figure 3.10 is something of a special case. The data happened to arise in such a way that the original estimate consisted of two modes of nearly equal height. In this case neither method could estimate the density well, and the results of either method would be highly sensitive to the initial solution provided.

Figure 3.11 provides some further justification for the claim that the greedy algorithm produces reasonable density estimates, by comparing the greedy- and SQP- based estimates, across all the generated data sets for which SQP was able to converge. The plot shows the ECDF of the total variation distance (equation 2.8) between the greedy- and SQP-based estimates, based on two groups of cases: the 1117 data sets for which SQP was better than greedy (in theL1(y,x) sense), and the 1755 data sets for which it was worse.

The figure shows that for those cases where SQP was better than the greedy algorithm in sharpening-distance terms, the density estimates did not differ by much.

0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.2 0.4 0.6 0.8 1

TV (total variation between greedy and SQP estimates)

F (T V) SQP better L₁ SQP worse L 1

Figure 3.11: Empirical CDFs of the total variation distances between SQP and greedy density estimates.

Over 95% of those cases had T V(ysqp,ygreedy) < 0.05, and only about 1% of them

had T V >0.1. Conversely, when SQP performed worse than the greedy method, the estimates had more pronounced differences. Only about 70% of the SQP-worse cases had T V < 0.05, and about 13% of the runs had T V > 0.1. In other words, when the greedy estimate loses, it does not lose by much, but when it wins, it can win by a wide margin. This is in agreement with the observations made from the sample of results illustrated in Figure 3.10.

In document Methods for Shape-Constrained Kernel Density Estimation (Page 80-85)