Maximizing a Pseudo-Likelihood - Bandwidth Selection

2.4 Bandwidth Selection

2.4.2 Maximizing a Pseudo-Likelihood

Despite the preceding justification for selecting the bandwidth prior to shape adjustment, a bandwidth selection procedure designed specifically for shape-constrained estimators would be welcome. Optimal bandwidth selection for this situation is an open problem. Rather than try to solve the problem rigorously here, a bandwidth choice with some some intuitive and practical appeal is proposed. The new bandwidth is denoted hM L, and is the maximizer of a quantity resembling a likelihood.

The proposal may be motivated by starting with the likelihood cross-validation bandwidth (2.14). There are two attributes of hLCV that are particularly important:

1. It promotes density estimates that place higher density on the observed data points.

2. It severely penalizes any density estimate that places small probability mass on an observed data point, because any h value yielding a negligible density over a single point will drive the product in equation (2.14) close to zero.

Attribute 1 is reasonable, and is one of the motivations behind maximum likelihood estimation in general. Attribute 2 has both positive and negative consequences. Its positive consequence is preventing any density estimates that place negligible probability mass near an observed value. Its negative consequence is sensitivity to outliers (Scott and Factor, 1981). Because the density estimate must not be too close to zero at any data point, outliers will have disproportionate influence on hLCV, tending to

cause larger h values to be selected.

If one were to apply likelihood cross-validation to the shape-adjusted density estimator ˆfq, the following bandwidth selector would suggest itself:

hLCV a = argmax h≥0 n Y i=1 ˆ fq₋i(xi;h), (2.15)

where the notationq−i indicates that the ith data point is withheld before determin-

ing the adjustment. Implementing (2.15) would be computationally intensive. In a line search over the possible values ofh, the adjustment procedure would need to be carried out ntimes for each candidate h. Outlier sensitivity similar to hLCV could be

expected, since a largerhvalue would still be required for the case when the outlying

x value is left out.

The proposed bandwidth selector for shape-adjusted KDEs attempts to retain the desirable characteristics of hLCV, with reduced outlier sensitivity and reduced

computational burden relative to hLCV a. The proposed selector is

hM L = argmax h≥0 n Y i=1 ˆ fq(xi;h), (2.16)

where ˆfq(xi;h) is the shape constrained estimator with bandwidth h. The product

in the right hand side of (2.16) is the likelihood of xunder the density ˆfq, if we take q to be a fixed vector (rather than what it truly is, a function of x and h). This

resemblance to a maximum likelihood estimate motivates the notation hM L.

The product in (2.16) does not involve the leave-one-out approach; the estimate ˆfq

only needs to be worked out once per candidate h value. The existence of the shape constraint makes it unnecessary to withhold points to obtain a reasonable bandwidth, because for most constraints of practical interest, the product Qn

i=1fˆq(xi;h)

approaches zero, not infinity, as h→ 0 when the constraint is enforced. Eliminating the cross-validation element from the selector causes an approximately n-fold reduc- tion in computation versus hLCV a, and should also reduce outlier sensitivity because

the outlying points never need to be “left out.”

Appendix A provides details of a simulation study that compareshM L tohSJ and hLCV a. The results suggest that hM L provides a reasonable bandwidth choice. While

it still has some sensitivity to outliers, this sensitivity is considerably reduced relative to likelihood cross-validation.

A Greedy Algorithm for Data

Sharpening

Heuristic optimizers operate by iteratively updating one or more candidate solutions. Each update is a move that shifts a solution from one location to another in the solution space. A major task of algorithm design is to define the set of possible moves a candidate solution can make at any stage of the search, and a means of selecting one move over the others. An algorithm is calledgreedy if, at each iteration, the move that causes maximal improvement in the objective function is selected.

Greedy algorithms are a convenient first choice when developing heuristics, because they are often conceptually simple and computationally fast. The use of locally optimal moves at each iteration maximizes the short-term improvement of the search, but also makes the search prone to be trapped in local optima. It is usually possible to improve the overall performance of a greedy heuristic by searching less aggressively for good solutions at each move.

This chapter introduces a greedy algorithm for shape-constrained density estimation by data sharpening. It is a deterministic algorithm that executes quickly, but owing to its greedy design it only works well for less stringent constraints like uni- modality. The algorithm is described below, and its properties are examined through examples and simulations. Afterwards it is shown how it can be incorporated into a metaheuristic known as iterated local search (ILS), that adds randomness to the

search and should make the algorithm capable of solving more complex problems.

3.1 The

improve

Algorithm

The new algorithm described in this chapter applies to data sharpening problems, where the target is the data x, and the candidate solution is a sharpened data vector

y. It carries out moves of the solution y (which are points in the n-dimensional solution space) by sequentially moving its elementsyi (which are scalar points in the

data space). Each move of a sharpened data point yi is done in a greedy manner.

The algorithm is called improve, because it takes a feasible guess solution as input and returns another feasible solution with improved objective function as output.

In document Methods for Shape-Constrained Kernel Density Estimation (Page 62-66)