Results on LibraryThing.com - Extensions to EAP and LAP

9.11 Extensions to EAP and LAP

9.12.1 Results on LibraryThing.com

We performed a series on experiments on the dataset crawled from LibraryThing.com. About a third of all edges were removed from the graph and were re-inserted again during our experiments. The re-insertion of edges simulated the creation of new edges over time. Next, we randomly chose a user among the ones with at least one outgoing edge in the full dataset to query for her top-k best friends and measured wall clock times (RT) and counted the various number of I/O operations (i.e. #SA, #MRG and #OL) performed until the top-k results could be correctly found.

EAP Approach With EAP, we chose top-k = 200 and queries were submitted sequentially by a randomly chosen user from LibraryThing.com who had at least one outgoing edge to a friend.

Figure 22a shows the wall-clock runtimes and Figure 22b the number of sequential accesses to the database measured by experiments with our algorithm implementing EAP which merges complete friendship lists. In total 63,572 queries were executed and every 100 queries one update was applied to the graph. With EAP, the runtimes are not affected that much from the number of merge operations (#MRG), or the number of opened friendship lists (#OL) but mainly from the total number of read operations on list entries which correspond to sequential accesses to the database (#SA). In the worst case, when each of the top-k friends are found in different lists by merge operations, 200 lists are opened with top-k=200. Indeed, it can be observed by the experiments, as depicted in Figure 22c, that for the EAP approach on LibraryThing.com on average around 180 lists have to be opened. This can be taken as evidence that the graph is tightly connected and a single update already influences many shortest path distances. Moreover, with EAP, #MRG=#OL, since friendship lists are opened only when merge operations are applied and then all entries are read such that complete lists are merged. For completeness, the Figures 22c and 22d visualise this property of our algorithm.

Finally, from the Figures 22a and 22b it easily can be seen that the runtimes are I/O driven and increase rapidly with the number of list entries fetched from the database. It takes over 20s to sequentially read around 60k entries from the involved friendship lists. The number of sequential reads is so large since the lists of transitive neighbours involved in a merge operation are long and, by the characteristics of the static algorithm, have to be read completely during a merge operation.

fixed-size EAP: Restricting EAP to a fixed number of top-k When always the same constant number of top-k friends are queried for users in some system, we can modify our EAP approach by restricting the number of list entries considered in merge operations to the value of top-k, denoting this restricted EAP approach as fixed-size EAP (see Section 9.9.6). fixed-size EAP makes sense since merging of friendship list with a huge number of transitive friends is expensive and merge operations only on prefixes of friendship lists still allow us to retrieve the correct top-k results as long as no query requires to fetch more than this fixed number of entries from any friendship list.

In the following experiment, we fixed the lists’ lengths in merge operations to the number of retrieved top-k friends with top-k=200 and for all queries, always 200 friends were retrieved, if available.

The results for RT are shown in the Figures 23a-23c and for #SA in the Fig- ures 23e-23g. The update ratio varies in the corresponding experiments from 1 update per 1, 10 and 100 queries and, depending on the update ratio, 17, 438, 93, 007 or218, 910 queries, respectively, were submitted in the context of a randomly chosen user.

It can be observed that the different update ratios have got only little effect on RT and #SA. For a query with an update ratio of 1/1, the maximum value for RT and #SA is693ms or 2925 sequential accesses to the DB, respectively, and with an update ratio of 1/100, the maximum values are slightly over723ms and 2130 sequential reads from friendship lists. The reason that RA is slightly higher in the latter case although #SA is lower is most likely caused by network latency issues during the experiments.

The charts additionally visualise the moving average over those (worst-case) queries that indeed received the total requested top-k friends within intervals of size 1000. Hence, the average RT and #SA is fairly similar for each update ratio, too. The drop in RT and #SA in the Figures 23a-23b and 23e-23f after 8,456 or 84,560 queries, respectively, happens because all available edge updates have been applied to the graph at that

(a) RT, max-k=top-k, upd-ratio 1/1 (b) RT, max-k=top-k, upd-ratio 1/10

(e) #SA, max-k=top-k, upd-ratio 1/1 (f) SA, max-k=top-k, upd-ratio 1/10

(g) #SA, max-k=top-k, upd-ratio 1/100 (h) #SA, max-k=500, upd-ratio 1/100

Figure 23: Runtime and #SA measurements of EAP on LibraryThing.com, randomly selected querying users, 1 update per varying number of queries: 1, 10 or 100, top- k=200, only top-k prefixes are merged. for (a), (b), (c), and (e), (f), (g). In (d) and (h) upd-ratio 1/100 and top-k=200 but merge operations on prefixes of size max-k=500.

time and, finally, the performance further improves when there are no more updates for a large number of queries. These experiments demonstrate that EAP, when restricting queries and merge operations to top-k list entries, can handle different update loads and benefits quickly from long periods without updates.

fixed-size EAP: Restricting EAP to a maximum number max-k To allow querying for a varied number of top-k friends, yet limiting the number of list entries involved in a merge operation, we introduces a constant max-k which specifies the maximum value allowed for top-k. In this way, a merge operation with fixed-size EAP merges always list prefixes of fixed-size max-k and the requested number of friends issued by a query can vary up to max-k while the correctness of the retrieved results still can be ensured by our algorithm.

The Figures 23d and 23h show the results for RT and #SA of our experiments with max-k=500 and 1 update per 100 queries. In order to compare the evaluation with the results shown in the Figures23c and 23g, the value for top-k has been kept equal to200 for each query. As expected, RT and #SA increase compared to the modification which restricts queries and merge operations to a constant value of top-k entries because larger prefixes of lists need to be merged. However, the average #RA/#SA keeps fairly stable at around850ms or 2800 sequential accesses to the DB, respectively.

For completeness, in Figure 24 the number of opened lists, #OL, and number of merge operations, #MRG, are shown for both versions of our fixed-size EAP approach.

(a) #OL, max-k=top-k, upd-ratio 1/100 (b) #MRG, max-k=top-k, upd-ratio 1/100

Figure 24: Experiments for EAP on LibraryThing.com, randomly selected querying users, 100 queries per update, top-k=200, merges only of top-k prefixes in (a), (b) of size 200 and in (c), (d) of size 500

(a) RT, max-k=500, upd-ratio 1/1 (b) #RT, max-k=500, upd-ratio 1/10

Figure 25: Experiments for EAP on LibraryThing.com, randomly selected querying users, top-k=200, and 1 update per 1 query in (a), (c), and 1 update per 10 queries in (b) and (d) with merges of size max-k=500.

By the characteristics of EAP, it follows #OL=#MRG which is confirmed by our experiments shown in the Figures24a-24b and 24c-24d. Furthermore, since only larger list prefixes are involved in merge operations but top-k is also not varied in our experiments with max-k=500, of course, #OL and #MRG are exactly the same as in the previous experiments since the same merge calls have to be done for retrieving the correct top-k friends. Hence, the corresponding charts are equal.

Additionally, the Figures 25a, 25c and 25b, 25d show in comparison with Fig- ure 23d and 23h again that at higher update ratios RT and #SA keeps fairly stable in our experiments with max-k=500, too. The drop in RT and #SA after8, 456 or 84, 560 queries, respectively, occurs again because all edge updates have been applied to the graph at that time.

LAP Approach For the experiments with the LAP approach of our algorithm, we set top-k= 200, too, in order to compare the results with EAP. Moreover, we varied the update ratio of new friendship edges in the same way as with EAP.

With LAP the runtimes, RT, and the number of sequential accesses to the database, #SA, depend on the number of opened lists, #OL, and the number of merge operations, #MRG, since only single list entries are merged on demand.

The Figures 26a-26d show the results for our experiments with the LAP approach of our algorithm. New friendship edges are inserted at an update ratio of1 update each100 queries. It can be observed that #SA on average is fairly low although #MRG

(a) RT, top-k=200, upd-ratio 1/100 (b) #SA, top-k=200, upd-ratio 1/100

(e) RT, top-k=200, upd-ratio 1/10 (f) RT, top-k=200, upd-ratio 1/1

(g) #SA, top-k=200, upd-ratio 1/10 (h) #SA, top-k=200, upd-ratio 1/1

Figure 26: Experiments for LAP on LibraryThing.com, randomly selected querying users, top-k=200, 1 update per 100 queries in 26a-(d), and and 1 update per query for RT in (f) and #SA in (h), and 1 update per 10 queries for RT in (e) and #SA in (g).

is much higher than with the EAP approach, even though the same number of lists are opened (#OL). The reason is that more merge operations have to be applied until all updates from other lists are merged since only a single entry is merged with each merge operation. In addition, for #SA is even lower as in the best case of EAP that restricts merge operations on only short prefixes of the size of some constant top-k, it also shows that indeed on average shorter prefixes of lists (<200) have to be read until the top-k nearest friends can be correctly identified.

However, when comparing the average RT of LAP shown in 26a with the results of fixed-size EAP shown in 23c and 23d, we can see that RT is slightly worse than the fixed-size EAPwith a constant top-k value of200 but much better than with the one with max-k=500. However, in any case, #SA is better with the LAP approach. There are two reasons why RT is not better than both versions of fixed-size EAP although #SA is: first, maintaining the queue of candidates with LAP causes additional costs—again the dominating factor is I/O, not CPU time. Second, although the queue of candidates is small (no more than top-k friends can be candidates), it is initialised with each query by fetching each single candidate from DB. A more efficient maintenance that allows to retrieve the lists with only one access to the DB could additionally reduce the costs of maintenance.

From the Figures 26e-26h it easily can be observed that LAP is more sensitive to the update ratio. For higher update ratios, both RT and #SA increase. The higher sensitivity becomes especially visible in the Figures 26f and 26h after8, 456 queries,

(a) #OL, top-k=200, upd-ratio 1/10 (b) #OL, top-k=200, upd-ratio 1/1

Figure 27: Experiments for LAP on LibraryThing.com, randomly selected querying users, top-k=200, 1 update per 10 queries for #OL in (a) and #MRG in (c), and 1 update per query for #OL in (b) and #MRG in (d).

and Figures 26e and 26g after84, 560 queries, when all friendship updates have been applied. The drop in RA and #SA is much more abrupt and steep than with EAP as depicted in the Figures 23a-23b and 23e-23f.

For completeness, the Figures 27a-27d depict the number of opened lists and number of merge operations applied with LAP at an update ratio of 1 update per 10 queries and 1 update per query. As expected, in comparison with Figure 26d, it can be observed that with higher update ratios, the number of merge operations increases. However, the average number of opened lists stays the same since the final top-k friends, of course, are found in the same lists. Again, although the number of merge operation increases with higher update ratios, the number of sequential accesses to the database does not. This can be taken as another indication that the graph is tightly connected and shorter prefixes of involved lists (compared to EAP) have to be read until the top-k friends can be identified.

In document Socially enhanced search and exploration in social tagging networks (Page 154-161)