Benchmark - JCR or RDBMS: why, when, how?

The previous section has summarized very succinctly and too quickly a huge problem. However, the main point to keep in mind is that intolerable differences should not appear if hierarchical data is managed with a content repository or a relational database.

The following benchmark has been done to verify this assumption.

Four products are included in this benchmark. CRX is a native implementation of the JCR specification.

The persistence of the items is managed with a proprietary technology which is based on the tar file compression (19) and implemented with java. H2 and Derby are two open source relational databases written in java. MySQL is one of the most widely used open source databases.

A simple wrapper has been defined for this benchmark. This wrapper proposes basic functions to create trees made of nodes and properties. The CRX wrapper uses directly the functionalities provided by the API. The SQL wrapper uses a simple database schema. One table stores the nodes and the other table stores the properties. The associations between items are managed with a parent foreign key and the default indexes of the

database are used on all fields. JDBC allows performing queries and prepared statements to avoid parsing the SQL statements each time.

The benchmark is composed of four parts which all measure the time required to perform an operation in hierarchies of different sizes. Each node of these base hierarchies has 5 sub-nodes and 5 properties

except leaves which only have 5 properties. The first hierarchy has one level. The following ones always include one more level. The tests have been launched 5 times on a Dell Latitude D820 installed with windows XP (processor: Intel Core Duo 2.00 GHz, virtual memory: 2.00GB). The average result is used in the following diagrams.

Writing the hierarchy

This test measures the time required to create the base hierarchy. The throughputs correspond to the time needed to write one item of the hierarchy. While the differences seem huge, all the throughputs are constant.

The assumption that native implementations of JCR and relational databases should be equivalent in term of performance is true in this case.

MySQL cannot be embedded in the application. This has a high impact on the result. H2 does not appear in the chart because its performance for write accesses is too good.

Reading the hierarchy

This test consists to read once all the items of the base hierarchy from the root to the leaves. The throughputs displayed in the chart correspond to the average time needed to read one item of the hierarchy. For most databases the results seam to be constant. Derby is just out of range. When recursive queries are performed on this database, the results are not tolerable.

0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00

36 186 936 4686 23436

Milliseconds

Items crx h2 mysql derby

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

36 186 936 4686 23436

Milliseconds

Items crx h2 mysql derby

Randomly writing the hierarchy

The test consists of randomly writing 100 sub-hierarchies in the base hierarchy. Each sub hierarchy has a depth of 2 levels. Each level has two sub nodes and two properties.

Thus, each sub hierarchy is composed of 21 items. The throughputs relate to the average time required to create all the items of one sub-hierarchy. The results of the first test are quite similar to this one.

The good point is that all the databases have constant results.

Randomly reading the hierarchy

The test consists of randomly reading 100 nodes and their descendants on two levels in the base hierarchy. The throughput relates to the average time required to read one node and its descendant. As in the second test, Derby is just out of range. The same problem is encountered with recursive queries. It appears that CRX is well optimized for these situations. To be really pertinent this test should be launched on bigger hierarchies. However, the difference between the results is constant and relational databases are not showing extremely bad performances for recursive queries.

6.3 Synthesis

As shown in this chapter, performance should not be used as the main argument to choose one technology over another. The aspects mentioned in the previous chapters are more important. The choice should relate to the nature of the problem which has to be solved and not to the nature of the product.

The assumption that relational databases are able to effectively manage hierarchical data is true. However,

this does not mean that java content repositories should be implemented as a layer over relational databases. Some base concepts of both specifications are in a mismatch and make a relational schema for JCR, which include all the aspects of the specification, will look unsuitable. More modularity (3) in the database world could benefit from both approaches. While this goal is not achieved, native’s implementation of JCR is probably the better of the proposed solutions.

0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00

36 186 936 4686 23436

Milliseconds

Items crx h2 mysql derby

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00

36 186 936 4686 23436

Milliseconds

Items crx h2 mysql derby

7 Scenario Analysis

The following diagram synthesizes the main aspects pointed out during the whole comparison process. Four use cases characterized by different features will be shortly analyzed in regard to their respective requirements and to the presented approaches.

JCR RDBMS

Data Model Level

Structure Unstructured

Semi structured Structured

Structured

Integrity Entity integrity Domain integrity Referential integrity

Transitive integrity in hierarchies

Entity integrity Domain integrity Referential integrity

Tools to manage data coherency

Operations and Queries Selection

Equi-join operations Full text search operation Transitive queries on hierarchies

Selection Projection Rename Join operations Domain operation

Create, read, update, delete statements

Navigation Navigation API

Traversal access Direct access Write access

Not supported

Specification Level

Inheritance Node types inheritance Node inheritance

Not supported

Access control Record level Table and Column level

Record level not supported

Observation Record level

Un-persisted event listeners Application interaction supported

Table level Persisted triggers

Application interaction not supported

Version control Supported Not supported

Project Level

Schema understandability DataGuides or Graphs Summarize the architecture

Not impacted by many-to-many associations

Entity Relationship

Represent the whole architecture Impacted by many-to-many associations

Code complexity Simple for Navigation Complex for Operations

Complex for Navigation Simple for Operations

Changeability More agile

Decoupled from the application

More rigid

Coupled with the application

In document JCR or RDBMS: why, when, how? (Page 36-40)