4.4 Results
4.4.2 Experiment 2: Learning 5 parameters
Table 4.4 outlines the results for all the parameters specified in the methodology, that is:
• sigT, the significance threshold value, as before.
• P, the prediction formula option, an integer in the range [0 − 3] indicating the prediction formula used (top-N, correlation thresholding, means over co-rated items or not).
• N, the top-N value when top-N is selected (i.e. when P , the prediction option, is 1 or 3).
• corrT, the correlation threshold value when top-N is not used (i.e. when P , the prediction option, is 0 or 2).
Table 4.4: Experiment 2: Learning 5 parameters.
Dataset sigT P N corrT sim GA MAE
M ovieLens 67 1 164 n/a 1 0.635 bookcrossing 2 or 3 2 n/a 0.000621 2 4.81 last.f m 4 or 15 0 n/a 0.00118 2 0.589 Epinions 0, 1 or 2 2 n/a 0.00265 2 1.774
• sim, an integer in the range [0 − 2], to indicate the similarity measure used: Spearman rank correlation, Pearson correlation, and cosine similarity respectively.
For this experiment, a population size of 200 is used and results are presented after 50 generations. As in the previous experiment, the rating threshold (the number of ratings a user must have in order to be included as a test user) remains constant for each dataset (10 for the MovieLens, last.fm and Epinions datasets, and 1 for the bookcrossing dataset).
Table 4.4 shows the result for the four datasets. For all but the MovieLens dataset, the same (or very similar) parameter values exist in the best six or ten individuals in the population. There was more variability in the top six best individuals for the MovieLens dataset and therefore the result reported is the average of the best two solutions. For the last.fm results, for the top six solutions, four of the solutions had sigT = 15 and two of the solutions had sigT = 4. All of the top six solutions had the same values for all remaining parameters. For the Epinions dataset, although very low values for sigT were chosen, there was no consistent value chosen across the top 10 best solutions.
The prediction formula chosen was correlation thresholding (option 0 or 2) for all but the MovieLens dataset. Where correlation thresholding is chosen, the thresh- old values for all three datasets is very low (0.000621, 0.00118 and 0.002665). For the bookcrossing and Epinion datasets, means are calculated over all the items a user has rated, and not just the co-rated items (option 2), whereas for the last.fm datasets a user’s mean is calculated over the co-rated items (option 0).
For the MovieLens dataset, for the best two solutions, top-N is chosen as the prediction formula with a high value for N (N = 164), as was found previously and shown in Table 4.2. However, correlation thresholding was chosen for some of the best 10 solutions — which indicates that both prediction formulae perform equally well for this dataset.
Table 4.5: Experiment 2: Evaluation of the Best set of GA Parameters.
Dataset GA MAE Avg.
MAE Avg. Coverage F1 Precision- at-1 M ovieLens 0.635 0.7333 99.5% 0.6392 1 bookcrossing sigT = 2 4.81 1.4384 15.72% 0.0814 1 last.f m sigT = 15 0.589 0.6194 99.79% 0.918 1 Epinions sigT = 1 1.774 0.8729 36.59% 0.2284 1
The similarity option (option 1) of Pearson correlation is chosen for just the M ovieLens dataset. For the remaining three datasets, the similarity option of cosine similarity (option 2) was chosen consistently across the best solutions for each dataset.
For the significance threshold parameter, sigT , mostly a small value was chosen except for the MovieLens dataset which has the highest threshold value across all the datasets (67). Some of the best solutions for the last.fm dataset also had a higher threshold value of 15. This was the one parameter that did not converge to a common value across the top 10 best solutions.
The GA MAEs are similar or slightly lower than those in the results in Table 4.2 — notably the GA MAE for both the bookcrossing and Epinions datasets are lower. This could be due to the larger population size and the longer number of generations in addition to the extra parameter used (sim).
Table 4.5 shows results averaged over ten runs in comparison to the best MAE found by the genetic algorithm (using the best set of solutions found per dataset). Where one or more options existed with respect to a parameter value, the pa- rameter and value chosen is listed (e.g., sigT equals 15 is chosen for the last.fm dataset). It can be seen from Table 4.5 that in two cases (bookcrossing and Epinions) the average MAE over ten runs is much lower than the GA MAE. However in both of these cases the average coverage is quite low. As noticed in the previous results with four parameters, (Table 4.3), the average MAE for the M ovieLensdataset is higher than the GA MAE and the average coverage is high. The same is true for the last.fm dataset results with the average MAE over 10 runs being higher than the GA MAE but with high coverage.
Table 4.6 compares the MAEs found, when using the four parameter values learned (Experiment 1), when using the five parameter values learned (Exper-
Table 4.6: Comparing MAEs across Experiments. Dataset Common Parameters Exp. 1 Parameters Exp. 2 Parameters M ovieLens 0.733 0.721 0.733 bookcrossing 1.529 1.46 1.438 last.f m 0.699 0.802 0.619 Epinions 0.9 0.884 0.8729
iment 2), and when a common set of parameters were used across all datasets (Results from Table 3.2 in Chapter 3). All results are generated with a 10% split of test data and a 90% split of training data. It can be seen that, apart from the last.f m dataset when learning four parameters, comparable (and mostly better) performance is found when using the parameter values learned by the genetic algorithm approach. It should be noted that the parameter values learned for the M ovieLens dataset in Experiment 2 (Table 4.4) are very similar to those used in the experiments in Chapter 3 where Pearson correlation was used to find similar users and neighbour selection was by a top-N approach (with N = 60). This explains the similar MAEs across experiments.