Implementation two - Constructing initial sets of columns

CHAPTER 3. MODEL FORMULATION

3.2 Constructing initial sets of columns

3.3.2 Implementation two

Consider the formulation of instance selection defined by the IP in (7) through (9). In order to implement a Type II column generation procedure the integrality constraints of (9) need to be relaxed. However, the integrality constraints of (9) can be relaxed and still allow for an integer solution to be found, as will now be proven.

Lemma 1. Given that ∗ is optimal to max ∑ ∈ : ∑ ∈ 1, 0 1 ∀ ∈ , ∑ _∈ ∗ 1 when 0 ∀ ∈ .

Proof: Assume ∑ _∈ ∗ 1. Then there exists such that ∗ 1. Construct a new solution

′ such that ∗ 1 ∑ _∈ ∗ , and ∗ ∀ . Since 1 ∑ _∈ ∗ 1 ∗, then ∗ 1 ∗ and it follows that 1. Additionally, since ∑ _∈ ∗ ∑ ∗

∗_{, then}_∑

∈ ∑ ∈ ∗ 1 ∑ ∈ ∗ , and again it follows that ∑ ∈ 1. Finally, because ∑ _∈ ∑ _∈ ∗ 1 ∑ _∈ ∗ , and since 1 ∑ _∈ ∗ 0, it is true that ∑ _∈ ∑ _∈ ∗. There is a contradiction, ∗ is not the optimal solution. ∎ Lemma 2. If max then ∗ 0, where ∗ is the optimal solution to

max ∑ ∈ : ∑ ∈ 1, 0 1 ∀ ∈ , when 0 ∀ ∈ .

Proof: Assume there exists k such that max and ∗ 0. Given i such that

max define a new solution where ∗ ∗, 0, and ∗ ∀

. Since ∗ ∗ ∑ _∈ ∗ and ∑ _∈ ∗ 1, then 1. Additionally since

∑ _∈ ∑ ∗

∗ ∗ ∑ ∈ ∗, and ∑ ∈ ∗ 1, then ∑ ∈ 1. Next, because and ∗ 0, it is known that ∑ _∈ ∗ ∑ ∗ ∗. Finally, it can be shown that ∑ _∈ ∑ ∗ ∗. There is a contradiction, ∗ is not the optimal solution. ∎

Theorem 1. Given that 0 ∀ ∈ , max ∑ _∈ : ∑ 1, 0 1 ∀ ∈

has an optimal solution such that ∗ ∈ 0,1 ∀ ∈ .

Proof: Assume there is no optimal solution such that ∗ ∈ 0,1 ∀ ∈ . Because of Lemma 1 it is known that ∑ _∈ ∗ 1. Because of Lemma 2 it is known that for any : ∗ 0 then

max , say k. It can then be shown that ∑ _∈ ∗ ∑ _∈ ∗ . Define a new

solution . Choose some such that k. Then define 1, and 0 ∀ . Then

∑ _∈ ∑ . It is then known that ∑ _∈ ∑ _∈ ∗. The new

The required relaxation of (7) through (9) is permissible because as Theorem 1 shows, there is always an integral optimal solution to the relaxed IP. This relaxation makes intuitive sense because it would be suboptimal to select a portion of any column that did not have the highest available accuracy. If two or more columns are tied for the highest accuracy it would be possible to select non-integer portions of each of those, but an optimal integer solution is still possible by selecting the entirety of one column. The result of this fact is that columns generated in the Type II column generation procedure will have higher accuracy than any of the existing columns. The resulting MP is,

∈ 21 . 1 ∈ 22 0, ∀ 23 1, ∀ , 24 and the associated RMP is,

∈ 25 . 1 ∈ 26 0, ∀ ′ 27

1, ∀ ′. 28 The POP is then,

. ∗ ∗ ₂₉

. 30

0,1 , ∀ , 31 where we let denote the feasible region defined by (30) and (31).

The POP formulation is identical to (18) through (20) except that the objective function automatically subtracts the last dual variable. This change simply enforces constraint (26) in the RMP. As before, this POP is not solvable because no closed form function is known to evaluate the accuracy of a classifier based on the contents of its training data. Section 3.4 presents approximation methods for solving the POP.

3.4 Approximating the price out problem

The price out problems defined in (18) through (20) and (29) through (31) are not solvable because there is no known closed form function for calculating the accuracy of a classifier based solely on the contents of its training data. However, if some approximation of classifier accuracy is substituted for actual accuracy, approximations of the price out problems can be solved. The objectives become to generate a column that maximizes an estimation of reduced cost. Any generated column can then be checked for truly positive

reduced cost by simply building the associated classifier and using its accuracy in the reduced cost calculation. If the column is indeed beneficial to the RMP it can be added, if it is not, a new column can be generated and considered.

Generating improving columns with an estimated POP requires a good method to predict the usefulness of a classifier based on its training data. Two methods that have been successful in preliminary tests are presented in this section. Both methods find a ranking of the instances and operate under the assumption that the more high-ranking instances

contained in a training dataset the higher the accuracy of the resultant classifier. Note that the sum of the ranks of the instances included in a classifier’s training data does not represent a prediction of the classifier’s accuracy. Rather the sum represents an indication of the classifier’s potential usefulness.

Because ranking instances does not result in a prediction of classifier accuracy, but rather a prediction of usefulness, the rank and dual variables used in the estimated POPs are scaled between zero and one. This scaling ensures a fair comparison when considering whether or not to include an instance in a new column. For example, if an instance is ranked relatively high, but has an associated dual variable that is also relatively high, it may not be a good instance to include in the generated column. Of course, it is not ideal to require that the rank of instances and the optimal dual variables be scaled, but it is necessary until some linear approximation of classifier accuracy is developed.

The two methods used to rank instances are discussed in the following subsections. The first relies on information theory and some minor pieces of empirical evidence. The second relies solely on empirical evidence. The ranking procedures allow the POP defined in (18) through (20) to be replaced by,

∗ _{: ∈} _{, 32}

and the POP defined in (29) through (31) can be replaced by,

∗ ∗ _{: ∈} _{. 33}

In both (32) and (33) r represents a scaled ranking of the instances, achieved using the first or second method, and ∗ are the scaled optimal dual variables.

In document Instance selection for model-based classifiers (Page 36-41)