The time-critical nature of the route calculation coupled with the high throughput of the parcel network impose strict demands on the route calculation engine. As long as the individual query latencies are low enough, it’s always possible to scale horizontally for increased throughput. The Cypher-based solution is fast, but with such high sus‐ tained throughput, every millisecond impacts the cluster footprint. For this reason, Global Post adopted an alternative approach: calculating routes using Neo4j’s traversal framework.
A traversal-based implementation of the route calculation engine must solve two prob‐ lems: finding shortest paths, and filtering paths based on time period. We’ll look at filtering paths based on time period first.
Traversals should follow only relationships that are valid for the specified delivery pe‐ riod. In other words, as the traversal progresses through the graph, it should only be presented with relationships whose periods of validity, as defined by their start_date and end_date properties, contain the specified delivery period.
We implement this relationship filtering using a PathExpander. Given a path from a traversal’s start node to the node where it is currently positioned, a PathExpander’s expand() method returns the relationships that can be used to traverse further. This method is called by the traversal framework each time the framework advances another node into the graph. If needed, the client can supply some initial state that flows along each branch; expand() can then use (and even change) this state in the course of de‐ ciding which relationships to return. The route calculator’s ValidPathExpander imple‐ mentation uses this branch state to supply the delivery period to the expander.
Here’s the code for the ValidPathExpander:
private static class IntervalPathExpander implements PathExpander<Interval> {
private final RelationshipType relationshipType; private final Direction direction;
private IntervalPathExpander( RelationshipType relationshipType, Direction direction )
{
this.relationshipType = relationshipType; this.direction = direction;
}
@Override
public Iterable<Relationship> expand( Path path,
BranchState<Interval> deliveryInterval ) {
List<Relationship> results = new ArrayList<Relationship>(); for ( Relationship r : path.endNode()
.getRelationships( relationshipType, direction ) ) {
Interval relationshipInterval = new Interval( (Long) r.getProperty( "start_date" ), (Long) r.getProperty( "end_date" ) );
if ( relationshipInterval.contains( deliveryInterval.getState() ) ) { results.add( r ); } } return results; } }
The IntervalPathExpander’s constructor takes two arguments: a relationshipType and a direction. This allows the expander to be reused for different types of relation‐ ships: in the case of the Global Post graph, the expander will be used to filter both CONNECTED_TO and DELIVERY_ROUTE relationships.
The expander’s expand() method takes the path to the current node, and the deliver yInterval as supplied by the client. Each time it is called, expand() iterates the relevant relationships on the current node (the current node is given by path.endNode()). For each relationship, the method then compares the relationship’s interval with the delivery interval. If the relationship’s interval contains the delivery interval, the relationship is added to the results.
Having looked at the IntervalPathExpander, we can now turn to the ParcelRouteCal culator itself. This class encapsulates all the logic necessary to calculate a route between the point where a parcel enters the network and the final delivery destination. It employs
a similar strategy to the Cypher query we’ve already looked at: it works its way up the graph from both the start node and the end node in two separate traversals, until it finds a delivery base for each leg. It then performs a shortest weighted path search that joins these two delivery bases.
Here’s the beginning of the ParcelRouteCalculator class:
public class ParcelRouteCalculator
{
private static final PathExpander<Interval> DELIVERY_ROUTE_EXPANDER = new IntervalPathExpander( withName( "DELIVERY_ROUTE" ), Direction.INCOMING );
private static final PathExpander<Interval> CONNECTED_TO_EXPANDER = new IntervalPathExpander( withName( "CONNECTED_TO" ), Direction.BOTH);
private static final TraversalDescription DELIVERY_BASE_FINDER = Traversal.description()
.depthFirst()
.evaluator( new Evaluator() {
private final RelationshipType DELIVERY_ROUTE = withName( "DELIVERY_ROUTE");
@Override
public Evaluation evaluate( Path path ) {
if ( isDeliveryBase( path ) ) {
return Evaluation.INCLUDE_AND_PRUNE; }
return Evaluation.EXCLUDE_AND_CONTINUE; }
private boolean isDeliveryBase( Path path ) {
return !path.endNode().hasRelationship( DELIVERY_ROUTE, Direction.INCOMING ); }
} );
private static final CostEvaluator<Double> COST_EVALUATOR = CommonEvaluators.doubleCostEvaluator( "cost" ); private final Index<Node> locationIndex;
public ParcelRouteCalculator( GraphDatabaseService db ) {
this.locationIndex = db.index().forNodes( "location" ); }
... }
Here we define two expanders—one for DELIVERY_ROUTE relationships, another for CONNECTED_TO relationships—and the traversal that will find the two legs of our route. This traversal terminates whenever it encounters a node with no incoming DELIV ERY_ROUTE relationships. Because each delivery base is at the root of a delivery route tree, we can infer that a node without any incoming DELIVERY_ROUTE relationships represents a delivery base in our graph.
The constructor for ParcelRouteCalculator accepts the current database instance. From this it obtains the location index, which it stores in a member variable.
Each route calculation engine maintains a single instance of this route calculator. This instance is capable of servicing multiple requests. For each route to be calculated, the client calls the calculator’s calculateRoute() method, passing in the names of the start and end points, and the interval for which the route is to be calculated:
public Iterable<Node> calculateRoute( String start, String end,
Interval interval ) {
TraversalDescription deliveryBaseFinder = createDeliveryBaseFinder( interval );
Path upLeg = findRouteToDeliveryBase( start, deliveryBaseFinder ); Path downLeg = findRouteToDeliveryBase( end, deliveryBaseFinder ); Path topRoute = findRouteBetweenDeliveryBases(
upLeg.endNode(), downLeg.endNode(), interval );
return combineRoutes( upLeg, downLeg, topRoute ); }
calculateRoute() first obtains a deliveryBaseFinder for the specified interval, which it then uses to find the routes for the two legs. Next, it finds the route between the delivery bases at the top of each leg, these being the last nodes in each leg’s path. Finally, it combines these routes to generate the final results.
The createDeliveryBaseFinder() helper method creates a traversal description con‐ figured with the supplied interval:
private TraversalDescription createDeliveryBaseFinder( Interval interval ) {
return DELIVERY_BASE_FINDER.expand( DELIVERY_ROUTE_EXPANDER,
new InitialBranchState.State<Interval>( interval, interval ) ); }
This traversal description is built by expanding the ParcelRouteCalculator’s static DELIVERY_BASE_FINDER traversal description using the DELIVERY_ROUTE_EXPANDER. The branch state for the expander is initialized at this point with the interval supplied by the client. This enables us to use the same base traversal description instance (DE LIVERY_BASE_FINDER) for multiple requests. This base description is expanded and parameterized for each request.
Properly configured with an interval, the traversal description is then supplied to fin dRouteToDeliveryBase(), which looks up a starting node in the location index, and then executes the traversal:
private Path findRouteToDeliveryBase( String startPosition,
TraversalDescription deliveryBaseFinder ) {
Node startNode = locationIndex.get( "name", startPosition ).getSingle(); return deliveryBaseFinder.traverse( startNode ).iterator().next(); }
That’s the two legs taken care of. The last part of the calculation requires us to find the shortest path between the delivery bases at the top of each of the legs. calculate Route() takes the last node from each leg’s path, and supplies these two nodes together with the client-supplied interval to findRouteBetweenDeliveryBases(). Here’s the implementation of findRouteBetweenDeliveryBases():
private Path findRouteBetweenDeliveryBases( Node deliveryBase1, Node deliveryBase2, Interval interval ) {
PathFinder<WeightedPath> routeBetweenDeliveryBasesFinder = GraphAlgoFactory.dijkstra(
CONNECTED_TO_EXPANDER,
new InitialBranchState.State<Interval>( interval, interval ), COST_EVALUATOR );
return routeBetweenDeliveryBasesFinder
.findSinglePath( deliveryBase1, deliveryBase2 ); }
Rather than use a traversal description to find the shortest route between two nodes, this method uses a shortest weighted path algorithm from Neo4j’s graph algorithm li‐ brary—in this instance, we’re using the Dijkstra algorithm (see “Path-Finding with Dijkstra’s Algorithm” on page 164 for more details of the Dijkstra algorithm). This algo‐ rithm is configured with ParcelRouteCalculator’s static CONNECTED_TO_EXPANDER, which in turn is initialized with the client-supplied branch state interval. The algorithm is also configured with a cost evaluator (another static member), which simply identifies the property on a relationship representing that relationship’s weight or cost. A call to findSinglePath on the Dijkstra path finder returns the shortest path between the two delivery bases.
That’s the hard work done. All that remains is to join these routes to form the final results. This is relatively straightforward, the only wrinkle being that the down leg’s path must be reversed before being added to the results (the leg was calculated from final destination upward, whereas it should appear in the results delivery base downward):
private Set<Node> combineRoutes( Path upLeg, Path downLeg, Path topRoute ) {
LinkedHashSet<Node> results = new LinkedHashSet<Node>(); results.addAll( IteratorUtil.asCollection( upLeg.nodes() )); results.addAll( IteratorUtil.asCollection( topRoute.nodes() )); results.addAll( IteratorUtil.asCollection( downLeg.reverseNodes() )); return results;
}
Summary
In this chapter we’ve looked at some common real-world use cases for graph databases, and described in detail three case studies that show how graph databases have been used to build a social network, implement access control, and manage complex logistics calculations.
In the next chapter we dive deeper into the internals of a graph database. In the con‐ cluding chapter we look at some analytical techniques and algorithms for processing graph data.
CHAPTER 6
Graph Database Internals
In this chapter we peek under the hood and discuss the implementation of graph data‐ bases, showing how they differ from other means of storing and querying complex, semi-structured, densely connected data. Although it is true that no single universal architecture pattern exists, even among graph databases, this chapter describes the most common architecture patterns and components you can expect to find inside a graph database.
We illustrate the discussion in this chapter using the Neo4j graph database, for several reasons. Neo4j is a graph database with native processing capabilities as well as native graph storage (see Chapter 1 for a discussion of native graph processing and storage). In addition to being the most common graph database in use at the time of writing, it has the transparency advantage of being open source, making it easy for the adven‐ turesome reader to go a level deeper and inspect the code. Finally, it is a database the authors know well.