Evaluation - Distance Oracles With Linear Space

Chapter 3 Distance Oracles With Linear Space

3.6 Evaluation

In this section, we evaluate the performance of our stretch 3 scheme of The- orem 3 on large-scale synthetic and real-world network topologies. We first present our methodology, followed by a summary of the evaluation results and conclude with a detailed discussion on the results.

3.6.1 An optimization

Although the worst-case stretch for our distance oracle is 4k−1, we can apply simple heuristics to improve the stretch in practice. Recall that the worst-case stretch in our oracles occurs for vertex pairs s, t for which B⋆(s)∩ B(t) = ;; the query may return a path, for instance, s ℓ(s) ℓ(t) t that is of stretch 3. The main observation is that for such vertex pairs, there may exist a w ∈ B⋆(s) for which the length of the path s w ℓ(w) ℓ(t) t is less than the path s ℓ(s) ℓ(t) t. The query can then be answered by the oracle as the minimum of the distances retrieved by checking all w∈ B⋆(s) (see §3.6 for implementation details). Since checking the length of the paths s

w ℓ(w) ℓ(t) t for all w ∈ B⋆_{(s) takes (asymptotically) the same time}

as checking the ball-vicinity intersection, the heuristic does not increase the query time, with potential improvements in stretch of retrieved paths. Indeed, this optimization not only improves the average stretch but also increases the number of vertex pairs for which our oracle returns the exact shortest paths.

3.6.2 Methodology

We evaluate four schemes: the stretch-3 TZ scheme with landmarks selected uniform randomly, the stretch-3 TZ scheme with landmarks selected using our scheme, and two version of our stretch 3 scheme: the stretch-3 scheme (for

k = 1) from the last oracle with α =pn with and without the optimization discussed above. For the TZ scheme, we sampled each vertex (for set L) with probabilityplog n/n. For our stretch 3 scheme, each vertex was sampled with probabilitypnlog n× deg(v)/ log2n. All the constants in the big-O notation were set to be 1. All these schemes were evaluated using static simulator, assuming static graph topologies, which we describe next.

We present evaluation results for three topologies. (1) G(n, m) random graphs, i.e., n = 16384 nodes with m uniform-random edges, with m set so that the average degree is 6, (2) geometric random graphs with n = 16384 nodes with average degree 6, and (3) a 33, 014 node AS-level map of the Internet (referred to as the Internet graph in this section) [46].

For G(n, m) graphs and the Internet graph, link weights are 1; for geometric random graphs, a link’s weight is the Euclidean distance between the position of its two vertices. For G(n, m) graphs and for geometric random graphs, we generated 10 different topologies with the same parameters and our results are the average of evaluations of these topologies. For geometric random graphs, we sampled a set of “source” vertices and evaluated the performance of the schemes from these sources to all the destinations. We found [81] that sampling 1/4 of the nodes as sources provided accurate results.

3.6.3 Results and Discussions

Fig. 3.2 shows the performance of the four schemes for various graph topologies (TZ is the original TZ scheme, TZ∗ scheme is discussed below in more detail). The most notable result of this evaluation is that our stretch 3 scheme allows retrieval of exact shortest paths for nearly all source-destination pairs: more than 98.4% in the G(n, m) graph, and more than 99.9% in the Inter- net graph. Though G(n, m) graphs and the Internet graph have highly dif- ferent structures, these graphs have a common feature: for nearly all source- destination pairs, the two vicinities intersect, thus providing a shortest path. In the G(n, m) graph (in which 96.2% source-destination pairs have intersect- ing vicinities), this occurs since, with high probability, the diameter of the graph is roughly at most twice the vicinity radius. In the Internet graph (in which 96.8% source-destination pairs have intersecting vicinities), vicinity intersection likely occurs at the “core” networks of the Internet. Since TZ scheme does not exploit the vicinity intersection, its performance is significantly worse than our schemes (only 34.4% of the source-destination pairs retrieved shortest paths).

1e-06 0.0001 0.01 1 1 1.5 2 2.5 3 CCDF Stretch TZ TZ* Stretch3 Stretch3* 1e-06 0.0001 0.01 1 1 1.5 2 2.5 3 CCDF Stretch TZ TZ* Stretch3 Stretch3* 1e-06 0.0001 0.01 1 1 1.5 2 2.5 3 CCDF Stretch TZ TZ* Stretch3 Stretch3*

Figure 3.2: Comparison of the stretch for our stretch-3 oracle and the stretch-3 oracle of Thorup and Zwick [41] for G(n, m) random graph (top), geometric random graph (middle) and AS-level internet map (bottom). As described in §3.6.2, TZ∗is the scheme which uses the set of landmarks constructed by the algorithm of §3.4; Stretch-3∗is the scheme which uses optimization discussed in §3.3.

The surprising difference between the performance of the two schemes may be due to the difference in which these schemes construct the landmark set

L. We evaluated a modified version of the TZ scheme that uses the same set

L as used by our schemes (see TZ∗ in Fig. 3.2). Although this improves the performance of the TZ scheme (74.2% of the source-destination pairs now retrieve shortest paths), it is still much worse than the our stretch 3 scheme. We, hence, believe that the high performance of our schemes is indeed due to the vicinity intersection idea.

For geometric random graphs, our stretch 3 scheme allows retrieval of shortest paths only for 19.2% of the source-destination pairs in comparison to 42.9% for the TZ scheme; indeed, only 4.8% of the source-destination pairs have intersecting vicinities. However, while the TZ-scheme performs better than our stretch 3 scheme on an average for the geometric random graph, the worst-case stretch for the TZ-scheme is consistently worse than our stretch 3 scheme. We believe that this is due to the P&S optimization, that allows many source-destination pairs to retrieve shorter paths due to short-cutting.

In document Low latency queries on big graph data (Page 54-57)