1. Introduction
This section discusses results obtained with the FVS-LDA scheme when Gaussian
kernels are used in the nonlinear projection step, instead of polynomial kernels. Gaussian
kernels require the selection of a parameter σ, as illustrated earlier in Equation (3.39).
Seven different values for σ were considered in this study: 8.37, 26.46, 83.67, 264.58,
836.66, 2645.75, and 8366.60. The experimental results discussed below are restricted to
those obtained with the value for σ equal to 264.58, which gave best macro-averaged and
micro-averaged F1-based performances, when averaged over all TDMs. The classical
algorithm chosen was the LDA approach. Four FVS-LDA experiments with a Gaussian
kernel were conducted, where the amount of dimensional reduction was varied. The
amount of dimensional reduction was measured by the number, k, of FVs selected for the
FVS-LDA classifier, were the smaller k corresponds to larger dimensional reduction. In
other words, k out of the 611 training documents were selected, where possible k values f
are: 50, 100, 200, and 400.
2. Macro-Averaged Results
As mentioned earlier, macro-averaging gives an equal weight to each category,
and is often dominated by the system’s performance on smaller classes [3]. Figure 16
shows the macro-averaged results for all 60 TDMs, a Gaussian kernel with σ equal to
264.58, and the four choices considered for k. A few comments can be made regarding
the macro-averaged results.
•
F1-based performance results obtained when k is 200 are very close to
those obtained for k = 400, as the average absolute difference across all
TDM configurations is equal to 0.0288 in that k range.
•
Micro-averaged F1 performances range roughly between 0.56 and 0.98 for
all TDM configurations except when the global factor selected is Normal.
In such a case F1 performance results degrade to the range between 0.14
and 0.67.
•
Results show the TDM type drives F1 classification performance results.
For example, a TDM choice of either “bgc” (Index 20) or “ngx” (Index
55) generates results between 0.90 and 0.98 for all k values.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 TDM Type F 1 - Per fo rm a n ce Meas ur e k=50 k=100 k=200 k=400
Figure 16.
Macro-Averaged Results for the FVS-LDA Classifier with a Gaussian Kernel
with σ Equal to 264.58 for Varying Dimension Reduction k Values.
because it performed well on our database. A few observations can be made regarding
Figure 17.
•
Note that F1-based performance values observed are quite close.
•
The dotted line going through the points correspond to k values from 50 to
400. It is the least squares line fit obtained from these 4 points. Note that
the line slope is nearly equal to zero, thereby indicating results for k = 400
are representative of the maximum performance for the FVS-LDA
classifier with a Gaussian kernel with σ equal to 264.58.
y = 0.0001x + 0.9286
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0
50
100
150
200
250
300
350
400
450
Number of Feature Vectors (k )
F
1
- P
e
rf
o
rm
a
n
ce M
eas
u
re
Figure 17.
Macro-Averaged FVS-LDA with a Gaussian Kernel with σ Equal to 264.58
Results for TDM (bgc) Versus Number of Feature Vectors.
3. Micro-Averaged Results
As mentioned earlier, micro-averaging gives an equal weight to each document
and is often dominated by the system’s performance on larger classes [3]. Figure 18
shows the micro-averaged results for all 60 TDMs, a Gaussian kernel with σ equal to
264.58, and the four choices considered for k. A few comments can be made regarding
these micro-averaged results.
•
F1-based performance results obtained when k is 200 are very close to
those obtained for k = 400, as the average absolute difference across all
TDM configurations is equal to 0.0299 in that k range.
•
Micro-averaged F1 performances range roughly between 0.59 and 0.98 for
all TDM configurations, except selecting the global factor as Normal. In
such a case F1 performance results degrade to the range between 0.20 and
0.76.
•
Results show the TDM type drives F1 classification performance results.
For example, a TDM choice of either “bgc” (Index 20) or “ngx” (Index
55) generates results between 0.91 and 0.98 for all k values.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 TDM Type F 1 - Per fo rm a n ce Meas ur e k=50 k=100 k=200 k=400
Figure 18.
Micro-Averaged Results for the FVS-LDA Classifier with a Gaussian Kernel
with σ Equal to 264.58 for Varying Dimension Reduction k Values.
Next, we selected a specific TDM configuration and compared resulting F1-based
performance results obtained with the FVS-LDA approach a Gaussian kernel with σ
•
Note that F1-based performance values observed are quite close.
•
The dotted line going through the points correspond to k values from 50 to
400. It is the least squares line fit obtained from these 4 points. Note that
the line slope is nearly equal to zero, thereby indicating results for k = 400
are representative of the maximum performance for the FVS-LDA
classifier with a Gaussian kernel and σ equal to 264.58.
y = 0.0001x + 0.9375
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0
50
100
150
200
250
300
350
400
450
Number of Feature Vectors (k )
F1
-
P
e
rf
o
rm
a
nc
e
Me
a
s
u
re
Figure 19.
Micro-Averaged FVS-LDA with a Gaussian Kernel with σ Equal to 264.58
Results for TDM (bgc) Versus Number of Feature Vectors.
4. Summary
Results show both macro-average and micro-average performances are similar.
As mentioned earlier, macro-averaging is often dominated by the system’s performance
on smaller classes while micro-averaging is often dominated by the system’s
performance on larger classes. Thus, these results lead to the observation that that the
smallest testing group containing 24 documents only was sufficiently large to be properly
classified using the FVS-LDA scheme with a Gaussian kernel for σ equal to 264.58.
Results also show that the TDM type dominates classification performances of the
FVS-LDA with Gaussian kernel and that the maximum F1 classification performance
obtained for the FVS-LDA classifier with a Gaussian kernel for σ equal to 264.58 is
around 0.98.
In document
Investigation into text classification with kernel based schemes
(Page 77-82)