Efficiency of Dijkstra’s Algorithm - Graph Databases. Ian Robinson, Jim Webber, and Emil Eifrem

Dijkstra’s algorithm is quite efficient because it computes only the lengths of a relatively small subset of the possible paths through the graph. When we’ve solved a node, the shortest path from the start node is then known, allowing all subsequent paths to safely build on that knowledge.

In fact, the fastest known worst-case implementation of Dijkstra’s algorithm has a per‐ formance of O(|R| + |N| log |N|). That is, the algorithm runs in time proportional to the number of relationships in the graph, plus the size of the number of nodes multiplied by the log of the size of the node set. The original was O(|R|^2), meaning it ran in time proportional to the square of the size of the number of relationships in the graph. Dijkstra is often used to find real-world shortest paths (e.g., for navigation). Here’s an example. In Figure 7-2 we see a logical map of Australia. Our challenge is to discover the shortest driving route between Sydney on the east coast (marked SYD) and Perth, marked PER, which is a continent away, on the west coast. The other major towns and cities are marked with their respective airport codes; we’ll discover many of them along the way.

Figure 7-2. A logical representation of Australia and its arterial road network

Starting at the node representing Sydney in Figure 7-3, we know the shortest path to Sydney is 0 hours, because we’re already there. In terms of Dijkstra’s algorithm, Sydney is now solved insofar as we know the shortest path from Sydney to Sydney. Accordingly, we’ve grayed out the node representing Sydney, added the path length (0), and thickened the node’s border—a convention that we’ll maintain throughout the remainder of this example.

Moving one level out from Sydney, our candidate cities are Brisbane, which lies to the north by 9 hours, Canberra, Australia’s capital city, which lies 4 hours to the west, and Melbourne, which is 12 hours to the south.

The shortest path we can find is Sydney to Canberra, at 4 hours, and so we consider Canberra to be solved, as shown in Figure 7-4.

Figure 7-3. The shortest path from Sydney to Sydney is unsurprisingly 0 hours

Figure 7-4. Canberra is the closest city to Sydney

The next nodes out from our solved nodes are Melbourne, at 10 hours from Sydney via Canberra, or 12 hours from Sydney directly, as we’ve already seen. We also have Alice Springs, which is 15 hours from Canberra and 19 hours from Sydney, or Brisbane, which is 9 hours direct from Sydney.

Accordingly, we explore the shortest path, which is 9 hours from Sydney to Brisbane, and consider Brisbane solved at 9 hours, as shown in Figure 7-5.

Figure 7-5. Brisbane is the next closest city

The next neighboring nodes from our solved ones are Melbourne, which is 10 hours via Canberra or 12 hours direct from Sydney along a different road; Cairns, which is 31 hours from Sydney via Brisbane; and Alice Springs, which is 40 hours via Brisbane or 19 hours via Canberra.

Accordingly, we choose the shortest path, which is Melbourne, being 10 hours from Sydney via Canberra. This is shorter than the existing 12 hours direct link. We now consider Melbourne solved, as shown in Figure 7-6.

Figure 7-6. Reaching Melbourne, the third-closest city to the start node of Sydney In Figure 7-7, the next layer of neighboring nodes from our solved ones are Adelaide at 18 hours from Sydney (via Canberra and Melbourne); Cairns, at 31 hours from Sydney (via Brisbane); and Alice Springs, at 19 hours from Sydney via Canberra, or 40 hours via Brisbane. We choose Adelaide and consider it solved at a cost of 18 hours.

We don’t consider the path Melbourne→Sydney because its destination is a solved node—in fact, in this case, it’s the start node, Sydney.

The next layer of neighboring nodes from our solved ones are Perth—our final desti‐ nation—which is 50 hours from Sydney via Adelaide; Alice Springs, which is 19 hours from Sydney via Canberra or 33 hours via Adelaide; and Cairns, which is 31 hours from Sydney via Brisbane.

We choose Alice Springs in this case because it has the current shortest path, even though with a god’s eye view we know that actually it’ll be shorter in the end to go from Adelaide to Perth—just ask any passing bushman. Our cost is 19 hours, as shown in Figure 7-8.

Figure 7-7. Solving Adelaide

In Figure 7-9, the next layer of neighboring nodes from our solved ones are Cairns at 31 hours via Brisbane or 43 hours via Alice Springs, or Darwin at 34 hours via Alice Springs, or Perth via Adelaide at 50 hours. So we’ll take the route to Cairns via Brisbane and consider Cairns solved with a shortest driving time from Sydney at 31 hours.

Figure 7-9. Back to Cairns on the east coast

The next layer of neighboring nodes from our solved ones are Darwin at 34 hours from Alice Springs, 61 hours via Cairns, or Perth via Adelaide at 50 hours. Accordingly, we choose the path to Darwin from Alice Springs at a cost of 34 hours and consider Darwin solved, as we can see in Figure 7-10.

Finally, the only neighboring node left is Perth itself, as we can see in Figure 7-11. It is accessible via Adelaide at a cost of 50 hours or via Darwin at a cost of 82 hours. Ac‐ cordingly, we choose the route via Adelaide and consider Perth from Sydney solved at a shortest path of 50 hours.

Figure 7-10. To Darwin in Australia’s “top-end”

Dijkstra’s algorithm works well, but because its exploration is undirected, there are some pathological graph topologies that can cause worst-case performance problems. In these situations, we explore more of the graph than is intuitively necessary, up to the entire graph. Because each possible node is considered one at a time in relative isolation, the algorithm necessarily follows paths that intuitively will never contribute to the final shortest path.

Despite Dijkstra’s algorithm having successfully computed the shortest path between Sydney and Perth, anyone with any intuition about map reading would likely not have chosen to explore the route northward from Adelaide because it feels longer. If we had some heuristic mechanism to guide us, as in a best-first search (e.g., prefer to head west over east, prefer south over north) we might have avoided the side-trips to Brisbane, Cairns, Alice Springs, and Darwin in this example. But best-first searches are greedy, and try to move toward the destination node even if there is an obstacle (e.g., a dirt track) in the way. We can do better.

Figure 7-11. Finally reaching Perth, a mere 50 driving hours from Sydney

In document Graph Databases. Ian Robinson, Jim Webber, and Emil Eifrem (Page 179-187)