FVS-LDA WITH GAUSSIAN KERNEL RESULTS - Investigation into text classification with kernel based

1. Introduction

This section discusses results obtained with the FVS-LDA scheme when Gaussian

kernels are used in the nonlinear projection step, instead of polynomial kernels. Gaussian

kernels require the selection of a parameter σ, as illustrated earlier in Equation (3.39).

Seven different values for σ were considered in this study: 8.37, 26.46, 83.67, 264.58,

836.66, 2645.75, and 8366.60. The experimental results discussed below are restricted to

those obtained with the value for σ equal to 264.58, which gave best macro-averaged and

micro-averaged F1-based performances, when averaged over all TDMs. The classical

algorithm chosen was the LDA approach. Four FVS-LDA experiments with a Gaussian

kernel were conducted, where the amount of dimensional reduction was varied. The

amount of dimensional reduction was measured by the number, k, of FVs selected for the

FVS-LDA classifier, were the smaller k corresponds to larger dimensional reduction. In

other words, k out of the 611 training documents were selected, where possible k values f

are: 50, 100, 200, and 400.

2. Macro-Averaged Results

As mentioned earlier, macro-averaging gives an equal weight to each category,

and is often dominated by the system’s performance on smaller classes [3]. Figure 16

shows the macro-averaged results for all 60 TDMs, a Gaussian kernel with σ equal to

264.58, and the four choices considered for k. A few comments can be made regarding

the macro-averaged results.

• F1-based performance results obtained when k is 200 are very close to

those obtained for k = 400, as the average absolute difference across all

TDM configurations is equal to 0.0288 in that k range.

• Micro-averaged F1 performances range roughly between 0.56 and 0.98 for

all TDM configurations except when the global factor selected is Normal.

In such a case F1 performance results degrade to the range between 0.14

and 0.67.

• Results show the TDM type drives F1 classification performance results.

For example, a TDM choice of either “bgc” (Index 20) or “ngx” (Index

55) generates results between 0.90 and 0.98 for all k values.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 TDM Type F 1 - Per fo rm a n ce Meas ur e k=50 k=100 k=200 k=400

Figure 16.

Macro-Averaged Results for the FVS-LDA Classifier with a Gaussian Kernel

with σ Equal to 264.58 for Varying Dimension Reduction k Values.

because it performed well on our database. A few observations can be made regarding

Figure 17.

• Note that F1-based performance values observed are quite close.

• The dotted line going through the points correspond to k values from 50 to

400. It is the least squares line fit obtained from these 4 points. Note that

the line slope is nearly equal to zero, thereby indicating results for k = 400

are representative of the maximum performance for the FVS-LDA

classifier with a Gaussian kernel with σ equal to 264.58.

y = 0.0001x + 0.9286

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0

50

100

150

200

250

300

350

400

450 Number of Feature Vectors (k )

F

1 - P

e

rf

o

rm

a

n

ce M

eas

u

re

Figure 17.

Macro-Averaged FVS-LDA with a Gaussian Kernel with σ Equal to 264.58

Results for TDM (bgc) Versus Number of Feature Vectors.

3. Micro-Averaged Results

As mentioned earlier, micro-averaging gives an equal weight to each document

and is often dominated by the system’s performance on larger classes [3]. Figure 18

shows the micro-averaged results for all 60 TDMs, a Gaussian kernel with σ equal to

264.58, and the four choices considered for k. A few comments can be made regarding

these micro-averaged results.

• F1-based performance results obtained when k is 200 are very close to

those obtained for k = 400, as the average absolute difference across all

TDM configurations is equal to 0.0299 in that k range.

• Micro-averaged F1 performances range roughly between 0.59 and 0.98 for

all TDM configurations, except selecting the global factor as Normal. In

such a case F1 performance results degrade to the range between 0.20 and

0.76.

• Results show the TDM type drives F1 classification performance results.

For example, a TDM choice of either “bgc” (Index 20) or “ngx” (Index

55) generates results between 0.91 and 0.98 for all k values.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 TDM Type F 1 - Per fo rm a n ce Meas ur e k=50 k=100 k=200 k=400

Figure 18.

Micro-Averaged Results for the FVS-LDA Classifier with a Gaussian Kernel

with σ Equal to 264.58 for Varying Dimension Reduction k Values.

Next, we selected a specific TDM configuration and compared resulting F1-based

performance results obtained with the FVS-LDA approach a Gaussian kernel with σ

• Note that F1-based performance values observed are quite close.

• The dotted line going through the points correspond to k values from 50 to

400. It is the least squares line fit obtained from these 4 points. Note that

the line slope is nearly equal to zero, thereby indicating results for k = 400

are representative of the maximum performance for the FVS-LDA

classifier with a Gaussian kernel and σ equal to 264.58.

y = 0.0001x + 0.9375

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0

50

100

150

200

250

300

350

400

450 Number of Feature Vectors (k )

F1

-

P

e

rf

o

rm

a

nc

e

Me

a

s

u

re

Figure 19.

Micro-Averaged FVS-LDA with a Gaussian Kernel with σ Equal to 264.58

Results for TDM (bgc) Versus Number of Feature Vectors.

4. Summary

Results show both macro-average and micro-average performances are similar.

As mentioned earlier, macro-averaging is often dominated by the system’s performance

on smaller classes while micro-averaging is often dominated by the system’s

performance on larger classes. Thus, these results lead to the observation that that the

smallest testing group containing 24 documents only was sufficiently large to be properly

classified using the FVS-LDA scheme with a Gaussian kernel for σ equal to 264.58.

Results also show that the TDM type dominates classification performances of the

FVS-LDA with Gaussian kernel and that the maximum F1 classification performance

obtained for the FVS-LDA classifier with a Gaussian kernel for σ equal to 264.58 is

around 0.98.

In document Investigation into text classification with kernel based schemes (Page 77-82)