Newton-like method algorithm - Parametric approaches

4.4 Parametric approaches

4.4.2 Newton-like method algorithm

The second approach that we employ to find the root of problem (4.15) is based on Newton-like method [17, 31, 60] described as follows. Suppose that at the beginning of iteration i a lower-bound ti _{on λ}⋆ _{is known, which can be obtained, e.g., by computing the} fractional objective function at any feasible solution. If v(ti) = 0, then ti = λ⋆_{; otherwise, the} algorithm updates ti+1= h(xi), where xi _{is an optimal solution of v}(ti), and proceeds to the next iteration. The formal pseudo-code is given in Algorithm 2.

Note that at each iteration of Algorithm 2 we can stop the optimization of problem (4.15) in line 6 whenever a feasible solution with an objective function value greater than rel⋅ ∣ti∣

and absis found, which, based on the discussion in Section 4.4.1, can result in more iterations but a better performance for the algorithm.

Algorithm 2 Newton-like method algorithm

1: Input: rel, relative gap parameter; abs, absolute gap parameter;

2: Output: x; if xj

=

1, then feature j is selected

3: i

←

4: Compute ti

▷

e.g., ti

=

(

1′

)

5: while time limit not exceeded do

6: Solve problem (4.15) for ti and obtain v

(

)

and its optimal solution xi 7: if v

(

) >

rel

⋅ ∣

∣

and v

(

) >

abs then

8: ti+1

←

(

)

9: else

10: return xi

▷

_{Solution found within either relative or optimality gaps}

11: end if

12: i

←

+

13: end while

14: return xi

▷

Best solution found within the time limit

Recall the relative and optimality gaps defined in (4.17). Following the proofs of similar results in [79] and [37, Proposition 4], if the time limit is not reached, then Algorithm 2 terminates with a feasible solution with either gap_rel ⩽ rel or gapabs ⩽ abs. If the time limit is reached after the operation of the i-th iteration of Algorithm 2, then we compute approximations of relative and absolute gaps by

gap_rel≃ v(t i)

∣ti∣ ⋅ g(xi), and gapabs≃

v(ti)

g(xi). (4.19)

4.5 Computational results

The aim of our computational study is to evaluate the performances of the MILP refor- mulations provided in Section 4.3 versus the parametric approaches of Section 4.4. In Sec- tion 4.5.1, we outline the real-life test instances and settings used for computational experiments. Then we present our results in Section 4.5.2.

4.5.1 Computational environment and test instances

In all of the computational test instances, we solve MILPs and BQPs (in each iteration of the parametric Algorithms 1 and 2) using CPLEX 12.7.1 [47]. We run experiments on a computer, where we allocate 4 threads (CPU 2.90GHz) and 16 GB of RAM for each individual experiment. We use a time limit of one hour (3600 seconds). To avoid running-out- of-memory difficulties we use the “node-file storage-feature” of CPLEX to store some parts of the branch-and-cut tree on a disk when the size of the tree exceeds the allocated memory. Furthermore, for computing the mutual information and correlation between a feature and the target class or between two features, as well as computing the classification accuracy score we use scikit-learn package [72] and Python 3.7.3 [78].

Test instances. We consider various real-world instances obtained from UCI ma- chine learning repository [5] and ASU feature selection repository [55] available at https: //archive.ics.uci.edu and http://featureselection.asu.edu, respectively. Table 13 pro- vides the list of instances as well as their sizes and their key characteristics.

Linearization bounds. In both MILP1 and MILP2, we let y` = 0 and yu = 1. More- over, for MILP2 reformulation of mRMR we letMbj = ∑k∈J∣I(fj, C) − I(fj, fk)∣ and Mdj = n, for all j ∈ J. For MILP2 reformulation of CFS we set Mbj = ∑k∈Jρ(fj, C) ⋅ ρ(fk, C) and Md

j = ∑k∈J,k≠j2ρ(fj, fk), for all j ∈ J. Finally, we consider M = ∑j∈J∑k∈J∣I(fj, C)−I(fj, fk)∣ in MILP4.

Gaps. We consider rel = 0.01 and abs = 0.001 in both Algorithms 1 and 2. If the time limit is reached, then gap_rel and gap_rel are computed by using formulas given in (4.18) and (4.19) for Algorithms 1 and 2, respectively. Similarly, in solving of the MILPs we set 0.01 and 0.001 for the relative and absolute optimality gaps in the solver which are computed by gap_rel= ∣U B_LB−LB∣ and gap_abs= ∣UB −LB∣, where UB and LB are the upper- and the lower- bound on the optimal objective function value at the termination of the solver, respectively.

Table 13: The sizes of the considered instances including the number of features, n, and the number of samples, m. Additionally, we provide some characteristics of the data instances such as the type of features values and the type of target class variable; if ∣C∣ = 2, then the target class is binary, otherwise it is multi-class.

Instance n m Data type Class type

banknote authentication1 ₄ _1,372 _continuous _binary

Breast cancer1 ₉ ₂₈₆ _discrete _binary

Letter Recognition1 ₁₆ _20,000 _discrete _multi

Zoo1 ₁₇ ₁₀₁ _discrete _multi

Breast Cancer Wisconsin (Diagnostic)1 ₃₁ ₅₆₉ _continuous _binary

SPECTF Heart Data1 ₄₄ ₂₆₇ _continuous _binary

Lung Cancer1 ₅₆ ₃₂ _discrete _binary

Sports articles for objectivity analysis1 ₅₉ _1,000 _discrete _binary

Connectionist1 ₆₀ ₂₀₈ _continuous _binary

Optical Recognition1 ₆₂ _3,823 _discrete _multi

Hill-Valley1 ₁₀₀ ₆₀₆ _continuous _binary

Urban Land Cover1 ₁₄₇ ₁₆₈ _continuous _multi

Epileptic Seizure Recognition1 ₁₇₈ _11,500 _discrete _multi

SCADI1 ₂₀₅ ₇₀ _discrete _multi

Semeion Handwritten Digit1 ₂₅₆ _1,593 _discrete _multi

USPS2 ₂₅₆ _9,298 _continuous _multi

lung discrete2 ₃₂₅ ₇₃ _discrete _multi

Madelon1,2 ₅₀₀ _2,000 _continuous _binary

ISOLET1,2 ₆₁₇ _7,797 _continuous _multi

Parkinson’s Disease1 ₇₅₄ ₇₅₆ _continuous _binary

CNAE-91 ₈₅₆ _1,080 _discrete _multi

Yale 32x322 _1,024 ₁₆₅ _continuous _multi

ORL 32x322 _1,024 ₄₀₀ _continuous _multi

colon2 ₂₀₀₀ ₆₂ _discrete _binary

PCMAC2 ₃₂₈₉ ₁₉₄₃ _discrete _binary

Classification accuracy score. Given a sample, the accuracy of a subset of features in predicting the true class of the sample can be evaluated by the classification accuracy. We use the well-known Naive Bayes classifier method (commonly used in the related literature, see, e.g., [67, 68, 73]), described below with the 5-fold cross validation to evaluate the accuracy of a subset of features.

Recall that set C denotes the set of possible values for the target class variable, i.e., C ∈ C. Let S be a subset of features and A be a vector of size ∣S∣, where Aj is the value of feature fj ∈ S in the sample. Then in order to evaluate the classification accuracy of S in classifying sample A, under the assumption that features are independent, Naive Bayes classifier uses the following equation to find the class of sample CA.

CA= argmax ck∈C

P(ck) ∏ Aj∈A

P(Aj∣ck), (4.20)

where probabilities P(ck) and P(aj∣ck) are computed based on the training data set. Equa- tion (4.20) implies that the most probable class is assigned as the class of sample A.

In document Reformulation Techniques and Solution Approaches for Fractional 0-1 Programs and Applications (Page 106-110)