Size - Large Scale Tests - Using low latency storage to improve RDF store performance

7.4 Large Scale Tests

7.4.1 Size

1 2 3 4 5 6 7 8 9

spo pos osp

Index size (GB)

B+Tree AHRI(bp) AHRI(hb) AHRI(hbwa)

Figure 7.20: Index sizes for 350 million triples of BSBM data (lower is better)

Figure 7.20 shows the sizes of each of the AHRI variants, compared against B+Tree, over a 350 million triple BSBM dataset. The relative sizes of AHRI and B+Tree stay similar when compared to the 5 million triple BSBM dataset. This is expected: BSBM is built on repeating patterns, so there is no fundamental change in the characteristics of the dataset when it gets scaled up. While the size of the B+Tree does increase superlinearly with the size of the dataset, this is at a rate of logn, where n is the average node size.

This is too small a factor to have a substantial influence.

As before, AHRI-hb is the smallest of the indexes, followed by closely by AHRI-bp, and trailed substantially by AHRI-hbwa. These results are mirrored when considering the full DBpedia set, shown in Figure 7.21. The results for this dataset are somewhat closer, with the B+Tree achieving a particularly good packing factor on its SPO index. Due to the fact that DBPedia subjects are of relatively high cardinality, AHRI gets relatively little advantage from the small size of its FixedBuckets, increasing overheads somewhat. In general, however, AHRI puts in a strong performance, and remains significantly smaller than the B+Tree.

2.5 3 3.5 4

spo pos osp

Index size (GB) B+Tree AHRI(bp) AHRI(hb) AHRI(hbwa) 0 0.5 1 1.5 2

Figure 7.21: Index sizes for the full 230 million triple DBPedia dataset (lower is better) 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

0 5e+07 1e+08 1.5e+08 2e+08 2.5e+08 3e+08 3.5e+08 4e+08

Triples Loaded/Time (Normalised against B+Tree)

Dataset Size (triples)

AHRI-bp (spo) AHRI-hb (spo) AHRI-bp (pos) AHRI-hb (pos) AHRI-bp (osp) AHRI-hb (osp)

Figure 7.22: Load rate with increasing BSBM dataset size (higher is better)

7.4.2 Load

Figure 7.22 shows AHRI’s load rate for BSBM data, relative to B+Tree, as the size of the dataset increases. Note, again, that both of these indexes are built using incremen- tal insertion methods, and so these results can be considered indicative of the cost of updating as well as the cost of bulk load.

The load rate of the SPO-ordered index is particularly noticeable here: AHRI’s performance relative to B+Tree increases dramatically as the dataset size increases. In reality, this is an artifact of B+Tree’s insert rate dropping, while AHRI’s stays approximately

constant. AHRI’s best insertion rate ocurred on the 30 million triple dataset, averaging 11.5 million triples per second. This high performance is a result of AHRI’s fast Fixed- Bucket structures: the only operations that have to be performed are a direct-mapped lookup in an array, followed by a copy and insertion into a very small array of data. This strategy is particularly eﬀective thanks to the fact that data is inserted in SPO order: almost all lookups in the L1 index will be cached, as will the FixedBuckets that get used. B+Tree suﬀers by comparison due to its relatively wide nodes and long lookup times.

The performance of AHRI’s POS indexes is still substantially better than that of B+Tree, although they do not reach the same heights. The POS-ordered structure is heavily influenced by the choice of L3 index: AHRI-bp inserts at a rate of 1.5-2 times that of B+Tree, compared to 2.5-3.5 times for AHRI-hb. The rate for AHRI-bp could be substantially improved by reducing the width of AHRI’s L3 B+Tree index nodes, which are currently very wide, at 1000 elements. This would trade oﬀ a small amount of space and read performance.

Finally, the insert rate of AHRI’s OSP ordered structures is between 2 and 2.5 times that of B+Tree. The rates are extremely similar for the diﬀerent L3 indexes, as BSBM makes no use of them on the OSP ordering.

1.5 2 2.5 3

spo pos osp

Load rate (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) 0 0.5 1

Figure 7.23: Load rate for the full 230 million triple DBpedia dataset (higher is better)

Figure 7.23 shows the insertion rate for DBpedia, a dataset that makes much less use of AHRI’s FixedBuckets, due to the relatively high cardinality of its Subjects. It can be seen in these results that while performance on SPO-ordered data is still substantially higher than B+Tree, it does not reach the same heights. The insertion rate for this index is just 5 million triples per second. By contrast, the relative load rates for the POS and OSP-ordered indexes are similar to that seen in the BSBM load phase.

7.4.3 Query

This section describes the query (read) performance for each index type, over the two large datasets. Testing was performed as in Section 7.3.3, with the exception that data was retrieved using an iterated method. Instead of passing in an array and writing all results into that array, the index structures returned an object that knew how to iterate over the data structures. This makes it possible to perform element-at-a-time retrieval, necessary to support pipelined operations. This mode of operation is more representative of the conditions found in modern database systems. Tables including the raw figures for each of these tests can be found in Appendix D.

Initial tests at this scale revealed flaws in the design of AHRI. In an iterated environment, where elements are returned from the index one at a time by calling a next() method, AHRI slows down significantly. This is due to the cost of calling the next() method. Simple codebases that do not feature much inheritance, like a B+Tree, can have many of their method calls inlined - that is, instead of calling the method, the compiler simply replaces the method call with a copy of the method’s code, eliminating the call cost. For methods that do not execute extremely regularly, this is a minor operation, but it is of great importance for methods that execute extremely frequently - like next(). Unfortunately, AHRI is more complex than a B+Tree, and the code executed in the next() call is subject to inheritance, which makes it impossible to inline. Since next() gets called for every single element returned, the quantity of method calls slows down execution considerably.

In order to overcome this issue, a version of AHRI that retrieves vectors of results at a time was implemented: effectively buffering calls to next(), and substantially reducing the number of method calls performed. Section 6.3.2.1 discusses the changes made to FixedBuckets to accommodate this change effectively. The alternative structure has negligible differences in terms of load times, and uses the same amount of space, so results for these factors are not discussed here.

7.4.3.1 BSBM

BSBM represents a relatively curated class of information, akin, for example, to the UniProt dataset. Subjects are of predictably very low cardinality, while the dataset contains only a very few predicates, each of which is of very high cardinality.

Figure 7.24 shows the query response rates for the SPO ordering of the BSBM dataset. As expected, AHRI is substantially faster than B+Tree for this ordering. With the higher overheads of the iterated environment mitigating the speed advantage of AHRI’s lookups, AHRI manages only an average of 2.5 times improvement upon B+Tree. This is still, however, a very substantial improvement. Since Subject-related queries are always

3.5 4 4.5

spo(mixed) spo(val1) spo(val2) spo(val3)

Queries/time (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) AHRI−vec(bp) AHRI−vec(hb) 0 0.5 1 1.5 2 2.5 3

Figure 7.24: Query performance over the 350 million triple BSBM dataset using SPO ordering (higher is better)

of low cardinality, and do not generally require an L3 index, there is little to choose between each of the AHRI variants.

3.5

pos(mixed) pos(val1) pos(val2) pos(val3)

Queries/time (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) AHRI−vec(bp) AHRI−vec(hb) 0 0.5 1 1.5 2 2.5 3

Figure 7.25: Query performance over the 350 million triple BSBM dataset using POS ordering (higher is better)

Results for the POS ordering (shown in Figure 7.25) show the real diﬀerences between each of the AHRI variants. As expected, the AHRI variants using a hash set L3 index perform poorly for retrievals limited by one or two attributes: they are slow to iterate over. They do, however, perform exceptionally well for find operations. AHRI variants using the B+tree L3 index perform well for retrievals limited by one or two attributes, thanks to very high iteration performance. The diﬀerence in iteration performance between the standard and vector models is clearly visible, with the vector model perfoming

substantially better. AHRI−vec(hb) 0 0.5 1 1.5 2 2.5 3

osp(mixed) osp(val1) osp(val2) osp(val3)

Queries/time (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) AHRI−vec(bp)

Figure 7.26: Query performance over the 350 million triple BSBM dataset using OSP ordering (higher is better)

Figure 7.26 depicts results for the OSP ordering, with AHRI again achieving substantial improvements over B+Tree. Since the OSP ordering makes no use of L3 indexes, there is predictably little diﬀerence between the AHRI-hb and AHRI-bp variants. Unexpectedly, the vector model actually causes a slight slowdown when compared to the standard iterated model. Since the expected incidence of queries over the OSP index is low, this is not a substantial concern.

7.4.3.2 DBpedia

The DBpedia dataset represents a significant change from the BSBM one. It is more organic, less managed, and demonstrates substantially diﬀerent characteristics with respect to the SPO index: a given subject might have a cardinality of several hundred elements, while it exhibits a much greater variety of predicates, a few of which are of low cardinality.

Figure 7.27 shows the results for the SPO ordering. AHRI, while still exhibiting a significant improvement over the B+Tree, does not show the same improvement that it did for BSBM. This is a predictable consequence of the fact that Subjects have substantially higher cardinality in the DBpedia dataset: AHRI’s major advantage is in the find time, and iterating over more elements proportionally reduces the influence of this factor. A further consequence of this increased emphasis on iteration performance is the improved results for the vector implementations.

AHRI−vec(hb) 0 0.5 1 1.5 2 2.5 3

spo(mixed) spo(val1) spo(val2) spo(val3)

Queries/time (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) AHRI−vec(bp)

Figure 7.27: Query performance over the full DBPedia dataset using SPO ordering (higher is better) AHRI−vec(hb) 0 0.5 1 1.5 2 2.5 3

pos(mixed) pos(val1) pos(val2) pos(val3)

Queries/time (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) AHRI−vec(bp)

Figure 7.28: Query performance over the full DBPedia dataset using POS ordering (higher is better)

Results for the POS index, show in Figure 7.28, are very similar to those for the BSBM dataset. Again, iteration-focused improvements like using the B+tree L3 index and using a vector layout provide substantial benefits.

Finally, the OSP ordering, depicted in Figure 7.29, also provides expected results. AHRI is substantially faster overall, with the vector implementations especially leaping ahead. The limited use of L3 indexes over the OSP ordering means that there is little to choose between the diﬀerent L3 index implementations.

AHRI−vec(hb) 0 0.5 1 1.5 2 2.5 3

osp(mixed) osp(val1) osp(val2) osp(val3)

Queries/time (Normalised against B+Tree)

B+Tree AHRI(bp) AHRI(hb) AHRI−vec(bp)

Figure 7.29: Query performance over the full DBPedia dataset using OSP ordering (higher is better)

7.4.4 Discussion

Overall, AHRI maintains a substantial advantage over B+Tree for these large datasets. It oﬀers a load rate between 2 and 6 times that of B+Tree, with a substantial reduction in size, and much better query throughput: for most operations AHRI can perform between 1.5 and 3 times as many queries per second as B+Tree.

The diﬀerent L3 index types, B+tree and hash set, have significantly diﬀerent behaviour. While they require approximately the same amount of space, AHRI-bp performs significantly better with respect to iteration over large quantities of data, while AHRI-hb is much better for load performance and find operations. The best choice of structure depends largely on workload.

Finally, the size of AHRI’s OSP-ordered index is again a noticeable issue. While it is somewhat smaller than the equivalent B+Tree structure, the saving is not especially large. Considering that the OSP index is generally used the least out of all the orderings, there is a clear need to work on a version of AHRI that provides better behaviour with respect to size, even at the cost of query performance.

7.5 Jena Plugin

Testing AHRI’s performance as part of a real RDF store is an important part of demon- strating AHRI’s real-world utility: while the results show that AHRI’s performance is substantially higher than alternative structures in virtually all cases, it must also be

demonstrated that this improvement has a significant impact in the context of all the other operations that a store has to perform.

The AHRI Jena Plugin was created to test AHRI’s performance inside a real query engine, and to verify the belief that AHRI would have a substantial impact upon the overall performance of an RDF store. This section considers the size and load/query performance of AJP using the same hardware as that described in Section 7.4.

As well as considering the AHRI and B+Tree index types, performance is compared against two alternative systems that use the Jena framework: The Jena Tuple Database (TDB), a high performance disk-backed store, and the Jena Memory Model (JMM), an example of a current in-memory system. Since these systems all use the same toolkit, and the same indexed nested loops join strategy, as many variables as possible have been eliminated. To ensure, as far as possible, similar query answering strategies, TDB’s statistics-based query optimiser was disabled. JMM was also modified to use the same parsing engine as TDB and AJP, an upgrade over its standard parser. The only change to the configuration described in Section 7.4 was for TDB, where the Java options used were ‘-server -Xmx5000M’. This lower memory limit is because the memory mapping that TDB performs does not count as part of the standard Java heap space. It is thus beneficial to restrict TDB to a smaller amount of memory, in order to ensure that the heap space does not grow to contend with the space available for memory mapping. Two datasets are used in this section: a 65M triple BSBM file, and a 43.4M triple DBpedia subset, constructed from DBpedia version 3.5, using the files specified by the DBpedia benchmark (Becker, 2008): infoboxes.nt, geocoordinates.nt, and homepages.nt. 7.5.1 Size

Dataset AHRI-bp AHRI-hb B+Tree TDB JMM BSBM (65M) 11.55 11.52 13.38 12.62 31.02*

DBPedia 4.89 4.9 6.0 5.9 13.62

Table 7.1: Space consumed (GB) by diﬀerent RDF stores, loading BSBM and DBpe- dia datasets. Note that JMM proved unable to load the full BSBM dataset, running out of memory during garbage collection. As a result, figures are linearly projected

from a smaller, 30.5 million triple BSBM document.

Table 7.1 breaks down the space consumption of each of the diﬀerent stores over the two datasets. Results for the vector implementations of AHRI are not included here, since they use the same amount of space as non-vector versions. Amongst these results, JMM’s consumption is the noticeable outlier. The reasons for this are twofold:

• JMM does not perform full normalisation of string data (as described in Section 4.4), meaning that string data is often stored more than once.

• JMM’s triple index structure, similar to the Hash structure used in this chapter, generates a lot of small objects, wasting a lot of space.

TDB’s space consumption was determined by examining the size of the data files that it generates, and adding the amount of heap space it consumes. Since TDB uses memory mapping to read and write data, this provides a reasonably accurate figure. TDB’s relatively low memory consumption is very noteable, as it comes in at even smaller than AJP’s B+Tree implementation. This is an artifact of a kind of implicit compression performed by TDB: when writing to disk, strings are converted into UTF-8 form. Since both of these datasets largely only require one byte per character, UTF-8 encoding saves a substantial amount of space when compared to Java’s two byte strings. The downside of this is that encoding and decoding the strings takes a certain amount of time. Overall, the AHRI implementations are more compact than the alternatives, but there is an unexpectedly small diﬀerence in size between B+Tree and AHRI for the BSBM data. This is due to the size of the string dictionary, which consumes 8.5GB in the current uncompressed implementation. This is atypical: BSBM generates an unusually large quantity of text data at 2.5 times as much per triple as the full DBPedia dataset. Figure 7.30(a) breaks down the amount of space used for string data and triple indexes when loading the BSBM data into AJP, showing that string data consumes the majority of memory space when loading the BSBM dataset, particularly when using AHRI. This is a clear indicator that the next focus for saving space should be on highly performant string compression, or other means of reducing the burden on memory space.

0 2 4 6 8 10 12 14 16 B+Tree AHRI Size (GB) Total Node Dictionary Triple Indexes (a) BSBM 0 1 2 3 4 5 6 7 8 B+Tree AHRI Size (GB) Total Node Dictionary Triple Indexes (b) DPbedia

Figure 7.30: Total space consumed by AJP (AHRI-bp) loading a 65M triple BSBM dataset (a) and 43.4M triple DBpedia dataset (b) (lower is better)

Figure 7.30(b) shows results for the 46M triple DBpedia dataset. Despite being over 2/3rds the size of the BSBM file in terms of triple count, it consumes less than half as much space, by virtue of its smaller string dictionary. The advantage of using AHRI is quite significant with this dataset.

7.5.2 Load

Dataset AHRI

(bp) AHRI(hb) AHRI-vec(bp) AHRI-vec(bp) B+Tree TDB JMM

BSBM (65M) 539 513 546 518 603 3768 1221*

DBPedia 268 251 273 274 327 2888 316

Table 7.2: Load times in seconds for diﬀerent RDF stores. Note that JMM proved unable to load the full BSBM dataset, running out of memory during garbage collection. As a result, figures are linearly projected from a smaller, 30.5 million triple BSBM

document.

Table 7.2 shows the time required to load data into each RDF store for each dataset. Note that TDB was timed using the included bulk loader. When compared to the results found in earlier tests, the diﬀerence in performance between AJP using B+Tree and AHRI is relatively limited. This is an indication that reading and parsing data from disk consumes the majority of AJP’s time. TDB is very substantially slower in this test, which is likely to be a consequence of having to occasionally flush data to disk.

AHRI’s advantage should become more apparent in systems that are not disk bound, or have greater disk bandwidth. These results indicate that work on faster parsers would be fruitful. One approach might be to work on a parser that uses multiple CPUs. 7.5.3 Query

In order to demonstrate AHRI’s query performance, three benchmarks were performed. The first (described in Section 7.5.3.1) was the standard BSBM version 2.0 benchmark, which represents a workload for an e-commerce site: mostly OLTP in nature, with relatively few long running, analytical queries.

In order to demonstrate AHRI’s performance in a more analytical situation, four relatively complex custom queries were also created over the BSBM dataset. The BSBM test driver was modified to use these queries in a second round of tests, described in Section 7.5.3.2. For both BSBM tests, the benchmark software was configured to run 100 warmup and 200 normal iterations, ensuring good accuracy of results.

Finally, Section 7.5.3.3 explores AHRI’s performance using the DBpedia benchmark. This benchmark is largely of a more analytical nature.

For each test running TDB in this section, the data was reloaded into the store, and the Java object representing TDB kept alive. This is important for ensuring that the store has the best opportunity to cache all its data in RAM: if TDB were restarted after loading data, the only information reliably held in RAM would be that used during the warmup runs. By contrast, loading reliably touches all the data in the system, giving the best odds that all information is cached.

7.5.3.1 BSBM

The standard BSBM suite connects to repositories over HTTP. For the purposes of this

In document Using low latency storage to improve RDF store performance (Page 181-200)