The previous section has summarized very succinctly and too quickly a huge problem. However, the main point to keep in mind is that intolerable differences should not appear if hierarchical data is managed with a content repository or a relational database.
The following benchmark has been done to verify this assumption.
Four products are included in this benchmark. CRX is a native implementation of the JCR specification.
The persistence of the items is managed with a proprietary technology which is based on the tar file compression (19) and implemented with java. H2 and Derby are two open source relational databases written in java. MySQL is one of the most widely used open source databases.
A simple wrapper has been defined for this benchmark. This wrapper proposes basic functions to create trees made of nodes and properties. The CRX wrapper uses directly the functionalities provided by the API. The SQL wrapper uses a simple database schema. One table stores the nodes and the other table stores the properties. The associations between items are managed with a parent foreign key and the default indexes of the
database are used on all fields. JDBC allows performing queries and prepared statements to avoid parsing the SQL statements each time.
The benchmark is composed of four parts which all measure the time required to perform an operation in hierarchies of different sizes. Each node of these base hierarchies has 5 sub-nodes and 5 properties
except leaves which only have 5 properties. The first hierarchy has one level. The following ones always include one more level. The tests have been launched 5 times on a Dell Latitude D820 installed with windows XP (processor: Intel Core Duo 2.00 GHz, virtual memory: 2.00GB). The average result is used in the following diagrams.
Writing the hierarchy
This test measures the time required to create the base hierarchy. The throughputs correspond to the time needed to write one item of the hierarchy. While the differences seem huge, all the throughputs are constant.
The assumption that native implementations of JCR and relational databases should be equivalent in term of performance is true in this case.
MySQL cannot be embedded in the application. This has a high impact on the result. H2 does not appear in the chart because its performance for write accesses is too good.
Reading the hierarchy
This test consists to read once all the items of the base hierarchy from the root to the leaves. The throughputs displayed in the chart correspond to the average time needed to read one item of the hierarchy. For most databases the results seam to be constant. Derby is just out of range. When recursive queries are performed on this database, the results are not tolerable.
0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00
36 186 936 4686 23436
Milliseconds
Items crx h2 mysql derby
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
36 186 936 4686 23436
Milliseconds
Items crx h2 mysql derby
Randomly writing the hierarchy
The test consists of randomly writing 100 sub-hierarchies in the base hierarchy. Each sub hierarchy has a depth of 2 levels. Each level has two sub nodes and two properties.
Thus, each sub hierarchy is composed of 21 items. The throughputs relate to the average time required to create all the items of one sub-hierarchy. The results of the first test are quite similar to this one.
The good point is that all the databases have constant results.
Randomly reading the hierarchy
The test consists of randomly reading 100 nodes and their descendants on two levels in the base hierarchy. The throughput relates to the average time required to read one node and its descendant. As in the second test, Derby is just out of range. The same problem is encountered with recursive queries. It appears that CRX is well optimized for these situations. To be really pertinent this test should be launched on bigger hierarchies. However, the difference between the results is constant and relational databases are not showing extremely bad performances for recursive queries.
6.3 Synthesis
As shown in this chapter, performance should not be used as the main argument to choose one technology over another. The aspects mentioned in the previous chapters are more important. The choice should relate to the nature of the problem which has to be solved and not to the nature of the product.
The assumption that relational databases are able to effectively manage hierarchical data is true. However,
this does not mean that java content repositories should be implemented as a layer over relational databases. Some base concepts of both specifications are in a mismatch and make a relational schema for JCR, which include all the aspects of the specification, will look unsuitable. More modularity (3) in the database world could benefit from both approaches. While this goal is not achieved, native’s implementation of JCR is probably the better of the proposed solutions.
0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00
36 186 936 4686 23436
Milliseconds
Items crx h2 mysql derby
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
36 186 936 4686 23436
Milliseconds
Items crx h2 mysql derby
7 Scenario Analysis
The following diagram synthesizes the main aspects pointed out during the whole comparison process. Four use cases characterized by different features will be shortly analyzed in regard to their respective requirements and to the presented approaches.
JCR RDBMS
Data Model Level
Structure Unstructured
Semi structured Structured
Structured
Integrity Entity integrity Domain integrity Referential integrity
Transitive integrity in hierarchies
Entity integrity Domain integrity Referential integrity
Tools to manage data coherency
Operations and Queries Selection
Equi-join operations Full text search operation Transitive queries on hierarchies
Selection Projection Rename Join operations Domain operation
Create, read, update, delete statements
Navigation Navigation API
Traversal access Direct access Write access
Not supported
Specification Level
Inheritance Node types inheritance Node inheritance
Not supported
Access control Record level Table and Column level
Record level not supported
Observation Record level
Un-persisted event listeners Application interaction supported
Table level Persisted triggers
Application interaction not supported
Version control Supported Not supported
Project Level
Schema understandability DataGuides or Graphs Summarize the architecture
Not impacted by many-to-many associations
Entity Relationship
Represent the whole architecture Impacted by many-to-many associations
Code complexity Simple for Navigation Complex for Operations
Complex for Navigation Simple for Operations
Changeability More agile
Decoupled from the application
More rigid
Coupled with the application