How to construct the initial columns (J’) of the RMP

CHAPTER 3. MODEL FORMULATION

3.5 Parameters

3.5.3 How to construct the initial columns (J’) of the RMP

J’ is the set of initial columns used to populate the RMP in the column generation procedure. It is known that the number of iterations required to solve a column generation procedure depends on this initial population. Even though column generation for instance selection will never be run to true optimality, it is clear that the quality of generated columns will depend on the initial columns of the RMP. One glaring reason for this dependency is that the two methods provided to estimate reduced cost depend on a subset of the columns from the RMP. For the early iterations of column generation the initial J’ is inextricably linked to the approximation of reduced cost.

Preliminary experimentation has shown that good initial columns of J’ should be diverse and contain beneficial instances (indicated by high column accuracy). However, it is not clear exactly how diverse the columns should be and how sensitive the procedure is to the inclusion of columns with non-beneficial instances. Five ways to create initial columns are summarized below, along with the positives and negatives of each.

The Backward Selection algorithm with equal to zero (B0) is the first method considered to create initial columns. B0 creates a column based on each training instance. Initially, a column contains all of the original training instances, but each instance (except the

instance the column is created for) is considered for removal in a random order. An instance is removed from the column if it is un-helpful, meaning its removal does not change the accuracy of the column. With B0, diversity is created by the random order instances are considered for removal from the column and helpful instances are purposefully not removed from the column.

The positive of B0 columns is that some unnecessary instances are removed from each subset. The negative is that columns contain a somewhat large number of instances, so large that it is expected they still contain un-helpful instances. These un-helpful instances may make the estimates of reduced cost ineffective and may encourage the inclusion of harmful instances in the columns generated by the price out problem.

The Backward Selection algorithm with equal to ten (B10) is the second method considered to create initial columns. These columns are created identically to the columns of B0, but now an instance is removed if doing so does not decrease the column’s original accuracy by more than ten percentage points. As with B0, B10 brings in diversity through the random order instances are considered for removal from the column. With B10 some of the helpful instances are purposefully not removed from the column. However, some marginally helpful instances can be removed from the column, as controlled by the allowed loss of ten percentage points from the original column accuracy.

The positive of B10 columns is that the columns contain fewer instances than the columns of B0, possibly having removed more of the unnecessary instances. The negative, indicated by the loss of training accuracy, is that helpful and necessary instances may have also been removed from the columns.

The Forward then Backward Selection Algorithm with equal to zero (FB) is the third method considered to create initial columns. In this procedure, again, a column is made for each instance. This time the initial column contains only the instance for which it is created. The remaining instances are then considered in a random order for inclusion in the column. Instances are included if doing so increases the accuracy of the column. After all of the instances are considered for inclusion, or the forward phase, the instances in the column are considered in a random order for removal. Instances are removed if doing so does not hurt the accuracy of the column. FB introduces diversity to the columns through the random order instances are considered for addition and subtraction, and extra effort is made to only include helpful instances.

The positive of the FB columns is that the columns have been carefully constructed to contain more helpful and fewer unnecessary instances than in previous methods. The

negative is that they may not have enough diversity to help the price out problem find improving columns, as indicated in preliminary experiments.

The two final methods consider the Random Selection Algorithm to create columns that contain randomly chosen instances. As with the other methods the number of random columns is equal to the number of the original training instances. The positive of these types of columns is that they are quite diverse. The negative is that helpful instances are included in columns only by random chance. Additionally, the user must decide how many instances should be included in each column.

For the first type of random columns (RB), the number of random instances included in each column is equal to the average number of instances included in the best columns created by each of the three previous methods. The reasoning for including this number of

instances is that the previous methods found good columns by including a certain number of instances, and the subset sizes are similar enough to warrant the average. It may be possible to recreate this prior success with the random inclusion of instances, the result of which would be quite diverse columns that contain helpful instances.

For the second type of random columns (RU), the number of random instances included in each column is decided by first creating a large number of randomly sized columns. Specifically, a large number of columns containing random instances are created for each possible number of instances (one instance through n instances, where n is the number of original instances). Then the best column from each size is plotted, and a user selects where it seems the column accuracy peaks or levels off. The reasoning for this is again to create diverse columns, but to do so with column sizes that seem to lead to high accuracy columns, or columns with helpful instances.

In document Instance selection for model-based classifiers (Page 48-51)