5.4 Implementation and Experimental Study
5.4.2 Experimental Results and Analysis
In this Section, two experiments are conducted using the overall procedure described in subsection 5.4.1. The purpose of the first experiment was to compare the average recall-precision values achieved by the proposed TF-ATO with and without the DA to the ones achieved by TF-IDF. This experiment considered the test collection as static. For this first experiment the largest test collection created by Cranfield paradigm was used which is Ohsumed and the pooling paradigm LATIMES test collection is also used in this experiment with their query sets (outlined in Table 3.3). The experiments were conducted on a PC with 3.60 GHz Intel (R) core(TM) i7-3820 CPU with 8GB RAM and the implementation was in Java NetBeans under Windows 7 Enterprise Edition. The computational run-time for indexing using TF-ATO, TF-ATO with DA and TF-IDF on Ohsumed collection are 25, 31 and 40 minutes respectively, while the indexing run-time using TF-ATO, TF-ATO with DA and TF-IDF on LATIMES collection are 24, 29 and 43
minutes.
Tables 5.1and 5.2present the results from the first experiment applied on Ohsumed and LATIMES. The tables show the Average Precision (AvgP) value obtained by each TWS method for nine Recall values as well as the correspondingMean Average Precision (MAP) value. It is observed that the proposed weighting scheme TF-ATO gives high effectiveness compared to TF-IDF. The tables show that TF-ATO without the DA does not achieve better precision values than TF-IDF for some recall values, but when the DA is used then TF-ATO always outperforms TF-IDF for all recall values. Considering all the recall values, the average improvement in precision using Ohsumed collection (given by the MAP value) achieved by TF-ATO without DA is 6.94% while the improvement achieved by TF-ATO using the DA is 41%. Moreover, the average improvement in precision using LATIMES collection achieved by TF-ATO without DA is 2.2% while the improvement achieved by TF-ATO using the DA is 29.4%.
Table 5.1: Average Recall-Precision and MAP for Static Experiment Applied on Ohsumed collection.
Recall AvgP and MAP for static document experiment
TF-IDF TF-ATO without DA TF-ATO with DA
0.1 0.648 0.713 0.816 0.2 0.445 0.47 0.61 0.3 0.343 0.361 0.472 0.4 0.253 0.259 0.362 0.5 0.216 0.196 0.288 0.6 0.176 0.153 0.24 0.7 0.156 0.13 0.199 0.8 0.136 0.114 0.154 0.9 0.123 0.108 0.13 MAP 0.277 0.278 0.364
From the results of this first experiment, it is clear that the proposed TF-ATO weighting scheme gives better effectiveness (higher average precision values) when
Table 5.2: Average Recall-Precision and MAP for Static Experiment Applied on LA- TIMES collection.
Recall
AvgP and MAP for static document experiment
TF-IDF TF-ATO without DA TF-ATO with DA
0.1 0.528 0.563 0.764 0.2 0.431 0.441 0.658 0.3 0.392 0.393 0.51 0.4 0.345 0.348 0.41 0.5 0.305 0.32 0.329 0.6 0.261 0.29 0.268 0.7 0.172 0.172 0.222 0.8 0.158 0.158 0.201 0.9 0.126 0.126 0.196 MAP 0.305 0.312 0.395
compared to TF-IDF in static test collections. Furthermore, there is an improvement by using the document centroid as a DA with the proposed weighting scheme. Moreover, the proposed DA reduces the size of the documents in the test collections by removing non-discriminative terms and less significant weights for each document. These reduction ratios are illustrated in Section5.5.2.
The purpose of the second experiment was to investigate the average recall-precision values achieved by the proposed TF-ATO with the DA to the ones achieved by TF-IDF but now considering the test collection as dynamic. In order to conduct this experiment considering the test collection as dynamic, the given document sets in the test collections are split into parts. Then, the first part of the test collection is taken as the initial test collection to apply steps 1-8 of the procedure described in section 5.4.1. This allows to compute the index terms IDF values and document centroid vector of term-weights for the collections. The test collections are then updated by adding the other parts but without updating the index terms IDF values or document centroid vector weights computed for the initial collections. So, no recalculation is done even after adding a large number (remaining parts) of documents to the initial collections. The reason for this is
that re-computing IDF values and assigning new weights (for updating documents in the collection) would have a computational cost ofO(N2∗M ∗Log(M)), whereN is the
number of documents in the collection andM is the number of index terms in the term space (Reed et al.,2006b). So, there would be a cost for updating the system in both IDF and document centroid values but there is no extra cost for using them for assigning term weights without updating.
In order to determine the ratio for splitting the test collections into parts, some preliminary experiments were conducted. The document set in the test collections were split into 2, 5, 10 and 30 parts and observed that if the ratio was small (few parts), the variation in MAP values was small and less significant. That is, the simulated effect of having a dynamic data stream was better achieved by splitting the collection into a larger number of parts. Thus, for the second experiment, the document sets in the test collections were split into 30 parts, i.e. the ratio between the initial document set in the test collection and the final updated document set in the collection was 1:29.
Table 5.3: Average Recall-Precision Using TF-IDF and TF-ATO with DA in Dynamic Experiment for Ohsumed.
Recall
AvgP and MAP for Dynamic Experiment
TF-IDF TF-ATO with DA
0.1 0.516 0.776 0.2 0.329 0.561 0.3 0.26 0.402 0.4 0.202 0.283 0.5 0.159 0.213 0.6 0.138 0.17 0.7 0.126 0.146 0.8 0.117 0.125 0.9 0.111 0.11 MAP 0.217 0.309
Tables 5.3 and 5.4 present the results from the second experiment applied on Ohsumed and LATIMES collections. The tables show the results using TF-IDF or
Table 5.4: Average Recall-Precision Using TF-IDF and TF-ATO with DA in Dynamic Experiment for LATIMES.
Recall
AvgP and MAP for Dynamic Experiment
TF-IDF TF-ATO 0.1 0.403 0.663 0.2 0.217 0.449 0.3 0.15 0.292 0.4 0.101 0.182 0.5 0.1 0.132 0.6 0.059 0.109 0.7 0.05 0.061 0.8 0.041 0.055 0.9 0.035 0.03 MAP 0.128 0.219
TF-ATO with DA for dynamic simulation experiment by adding more documents in the document set without re-weighting neither IDF nor DA. The tables show the average precision values obtained by the given TWS method for nine Recall values as well as the corresponding MAP value.
From these tables, it is observed that there is a drawback in the effectiveness compared to the case with static data streams. The MAP drawback ratios from static to dynamic using TF-IDF and TF-ATO with DA on Ohsumed collection are 21.8% and 15% respectively, while, the MAP drawback ratios from using TF-IDF and TF-ATO with DA on LATIMES collection are 57.9% and 44.5%. This means only large variation on the document size by adding a large number of documents may cause an impact on the IR effectiveness. However, the proposed weighting scheme TF-ATO with DA still gives better effectiveness values than those produced with the TF-IDF weighting scheme. It can be also seen from these tables that the average improvement in precision of TF-ATO with DA compared to TF-IDF is 42.38% for Ohsuemd collection, while the average improvement in precision of TF-ATO with DA compared to TF-IDF is 70.7% for LATIMES collection. Furthermore, the dynamic experiment shows the effect of the strong dynamic variation and drawback in IR effectiveness. Adding a large number
Figure 5.1: Illustrating the Average Precision performance for Static/Dynamic Experi- ments on Ohsumed Collection.
of documents to the document set (the index file) can cause a drawback in IR system effectiveness.
Figures 5.1 and 5.2 illustrate the bar chart for static/dynamic experiments on Ohsumed and LATIMES results reported in the tables mentioned above. In these figures, higher values correspond to better performance. From these figures it can be observed that the TF-ATO with DA TWS exhibits the overall best performance. On the other hand, the p-values of paired t-test for experiments is shown in Table 5.5. From the table, we can observe that the improvements using TF-ATO with DA are significant comparing with TF-IDF.