SPARQL Feature Selection and Query Variability

6. DBpedia SPARQL Benchmark

6.4. SPARQL Feature Selection and Query Variability

After the completion of the detection of similar queries and their clustering, our aim is now to select a number of frequently executed queries that cover most SPARQL features and allow us to assess the performance of queries with single as well as combinations of features. The SPARQL features we consider are:

the overall number of triple patterns contained in the query (|GP |), the graph pattern constructors UNION (UON ), OPTIONAL (OPT ), the solution sequences and modifiers DISTINCT (DST ),

as well as the filter conditions and operators FILTER (FLT ), LANG (LNG), REGEX (REG ) and STR (STR).

We pick different numbers of triple patterns in order to include the efficiency of JOIN operations in triplestores. The other features were selected because they frequently occurred in the query log. We rank the clusters by the sum of the frequency of all queries they contain. Thereafter, we select 25 queries as follows: For each of the features, we choose the highest ranked cluster containing queries having this feature. From that particular cluster we select the query with the highest frequency.

In order to convert the selected queries into query templates, we manually select a part of the query to be varied. This is usually an IRI, a literal or a filter condition. In Figure 6.1 those varying parts are indicated by %%var%% or in the case of multiple varying parts %%varn%%. We exemplify our approach to replacing varying parts of queries by using Query 9, which results in the query shown in Figure 6.1. This query selects a specific settlement along with the airport belonging to that

6.5. Experimental Setup 1 SELECT * WHERE { 2 { ? v a r 0 a dbp - owl : S e t t l e m e n t ; 3 r d f s : l a b e l %% var %% . 4 ? v a r 1 a dbp - owl : A i r p o r t . } 5 { ? v a r 1 dbp - owl : c i t y ? v a r 0 . } 6 UNION 7 { ? v a r 1 dbp - owl : l o c a t i o n ? v a r 0 . } 8 { ? v a r 1 dbp - p r o p : i a t a ? v a r 2 . } 9 UNION 10 { ? v a r 1 dbp - owl : i a t a L o c a t i o n I d e n t i f i e r ? v a r 2 . } 11 OPTIONAL 12 { ? v a r 1 f o a f : h o m e p a g e ? v a r 3 . } 13 OPTIONAL 14 { ? v a r 1 dbp - p r o p : n a t i v e n a m e ? v a r 4 . } 15 }

Figure 6.1.: Sample query with placeholder.

1 SELECT DISTINCT ? var WHERE {

2 { ? v a r 0 a dbp - owl : S e t t l e m e n t ; 3 r d f s : l a b e l ? var . 4 ? v a r 1 a dbp - owl : A i r p o r t . } 5 { ? v a r 1 dbp - owl : c i t y ? v a r 0 . } 6 UNION 7 { ? v a r 1 dbp - owl : l o c a t i o n ? v a r 0 . } 8 { ? v a r 1 dbp - p r o p : i a t a ? v a r 2 . } 9 UNION 10 { ? v a r 1 dbp - owl : i a t a L o c a t i o n I d e n t i f i e r ? v a r 2 . } 11 OPTIONAL 12 { ? v a r 1 f o a f : h o m e p a g e ? v a r 3 . } 13 OPTIONAL 14 { ? v a r 1 dbp - p r o p : n a t i v e n a m e ? v a r 4 . } 15 } LIMIT 1 0 0 0

Figure 6.2.: Sample auxiliary query returning potential values a placeholder can assume.

settlement as indicated in Figure 6.1. The variability of this query template was determined by getting a list of all settlements using the query shown in Figure 6.2. By selecting suitable placeholders, we ensured that the variability is sufficiently high (≥ 1000 per query template). Note that the triplestore used for computing the variability was different from the triplestore that we later benchmarked in order to avoid potential caching effects.

For the benchmarking we then used the list of thus retrieved concrete values to replace the %%var%% placeholders within the query template. This method ensures, that (a) the actually executed queries during the benchmarking differ, but (b) always return results. This change imposed on the original query avoids the effect of simple caching.

6.5. Experimental Setup

This section presents the setup we used when applying the DBPSB on four triplestores commonly used in Data Web applications. We first describe the

triplestores and their configuration, followed by our experimental strategy and finally the obtained results. All experiments were conducted on a typical server machine with an AMD Opteron 6 Core CPU with 2.8 GHz, 32 GB RAM, 3 TB RAID-5 HDD running Linux Kernel 2.6.35-23-server and Java 1.6 installed. The benchmark program and the triplestore were run on the same machine to avoid network latency.

Triplestores Setup We carried out our experiments by using the triplestores Virtuoso [Erling and Mikhailov, 2007], Sesame [Broekstra et al., 2002], Jena-TDB [Owens et al., 2008b], and BigOWLIM [Bishop et al., 2011]. The configuration and the version of each triplestore were as follows:

1. Virtuoso: Open-Source Edition version 6.1.2: We set the following memory- related parameters: NumberOfBuffers = 1048576, MaxDirtyBuffers = 786432. 2. Sesame: Version 2.3.2 with Tomcat 6.0 as HTTP interface: We used the

native storage layout and set the spoc, posc, opsc indices in the native storage configuration. We set the Java heap size to 8GB.

3. Jena-TDB: Version 0.8.7 with Joseki 3.4.3 as HTTP interface: We configured the TDB optimizer to use statistics. This mode is most commonly employed for the TDB optimizer, whereas the other modes are mainly used for investigating the optimizer strategy. We also set the Java heap size to 8GB.

4. BigOWLIM: Version 3.4, with Tomcat 6.0 as HTTP interface: We set the entity index size to 45,000,000 and enabled the predicate list. The rule set was empty. We set the Java heap size to 8GB.

In summary, we configured all triplestores to use 8GB of memory and used default values otherwise. This strategy aims on the one hand at benchmarking each triplestore in a real context, as in real environment a triplestore cannot dispose of the whole memory up. On the other hand it ensures that the whole dataset cannot fit into memory, in order to avoid caching.

6.5.1. Benchmark Phases

Once the triplestores loaded the DBpedia datasets with different scale factors, the benchmark execution phase began. It comprised the following stages:

1. System Restart: Before running the experiment, the triplestore and its associated programs were restarted in order to clear memory caches.

2. Warm-up Phase: In order to measure the performance of a triplestore under normal operational conditions, a warm-up phase was used. In the warm-up phase, query mixes were posed to the triplestore. The queries posed

6.6. Benchmarking Results 10% 50% 100% 200% Dataset size 100 101 102 103

10QMpH (logarithmic)4 Virtuoso Sesame Jena-TDB BigOWLIM

Figure 6.3.: QMpH for all triplestores of DBPSB version 1.

during the warm-up phase were disjoint with the queries posed in the hot-run phase. For DBPSB, we used a warm-up period of 20 minutes.

3. Hot-run Phase: During this phase, the benchmark query mixes were sent to the tested store. We kept track of the average execution time of each query as well as the number of query mixes per hour (QMpH). The duration of the hot-run phase in DBPSB was 60 minutes.

Since some benchmark queries did not respond within reasonable time, we specified a 180 second timeout after which a query was aborted and the 180 second maximum query time was used as the runtime for the given query even though no results were returned. The benchmarking code along with the DBPSB queries is freely available5_.

We have created 2 versions of DBPSB, with the following specifications:

DBPSB version 1: we used 4 different dataset sizes, i.e. 10%, 50%, 100%, and 200%, and 25 queries for performance evaluation. The query list of DBPSB version 1 can be found in Appendix A.

DBPSB version 2: we used 3 different dataset sizes, i.e. 10%, 50%, and 100%, and 20 queries for performance evaluation. The query list of DBPSB version 2 can also be found in Appendix A.

6.6. Benchmarking Results

We evaluated the performance of the triplestores with respect to two main metrics: their overall performance on the benchmark and their query-based performance

✁✂ ✄✁✂ ✁✁ ✂ ☎✁✁ ✂ ✆✝✞✝✟✠✞✟✡☛✠ ☞ ✌☞ ✍☞ ✎ ☞ ✏☞ ✑☞ ✒ ☞ ✓✠✝✔ ✕✡✖✞✗✘✟✘ ✙✠ ✟✝✚✠ ✛✠✔✝✜✢✆✣ ✣✡✤✥✦✧★✓

Figure 6.4.: Geometric mean of QpS of DBPSB version 1.

10% 50% 100% 200% Dataset size 100 101 102 103

10QMpH (logarithmic)4 Virtuoso Sesame Jena-TDB BigOWLIM

Figure 6.5.: QMpH for all triplestores of DBPSB version 2.

0 20 40 60 80 100 120 10% 50% 100% Mean Dataset size

Virtuoso Sesame Jena TDB BigOWLIM

6.6. Benchmarking Results ✁ ✂ ✄ ☎✆ ☎ ☎ ✁ ☎✂ ☎ ✄ ✆ ✁ ✝✞✟✠✡☛☞✌ ✆ ☎✆✆ ✆✆ ✍✆✆ ✁✆✆ ✎✆✆ ✏ ✑ ✒ ✓✔✠✕✞☞✖☞ ✗✟✖✘✙✟ ✚✟✛✘✜✢✣ ✣✔✤✥✦✧★✩ ✪✫✬✭✮✯✰✱✲✳✴✵✴✶✷✵ ✁ ✂ ✄ ☎✆ ☎ ☎✁ ☎✂ ☎✄ ✆ ✁ ✝✞✟✠✡☛☞✌ ✆ ✆ ✁✆ ✂✆ ✄✆ ☎✆✆ ☎✆ ☎✁✆ ☎✂✆ ☎✄✆ ✆✆ ✆ ✍ ✎ ✏ ✑✒✠✓✞ ☞✔☞ ✕✟✔✖✗✟ ✘✟✙ ✖✚✛✜ ✜✒✢✣✤✥✦✧ ★✩✪✫✬✭✮✯✰✱✲✳✲✴✵✳ ✁ ✂ ✄ ☎✆ ☎ ☎✁ ☎✂ ☎✄ ✆ ✁ ✝✞✟✠✡☛☞✌ ✆ ✆ ✁✆ ✂✆ ✄✆ ☎✆✆ ☎✆ ☎✁✆ ☎✂✆ ☎✄✆ ✆✆ ✆ ✍ ✎ ✏ ✑✒✠✓✞ ☞✔☞ ✕✟✔✖✗✟ ✘✟✙ ✖✚✛✜ ✜✒✢✣✤✥✦✧ ★✩✪✫✬✭✮✯✯✰✱✲✳✲✴✵✳ ✁ ✂ ✄ ☎✆ ☎ ☎ ✁ ☎✂ ☎✄ ✆ ✁ ✝✞✟✠✡☛☞✌ ✆ ✆ ✁✆ ✂✆ ✄✆ ☎✆✆ ☎✆ ☎ ✁✆ ☎✂✆ ☎ ✄✆ ✆✆ ✆ ✍ ✎ ✏ ✑✒✠✓✞ ☞✔☞ ✕✟✔✖✗✟ ✘✟✙✖✚✛✜ ✜✒✢✣✤✥✦✧ ★✩✪✫✬✭✮✯✯✰✱✲✳✲✴✵✳

Figure 6.7.: Queries per Second (QpS) of DBPSB version 1 for all triplestores for 10%, 50%, 100%, and 200%.

The overall performance of the triplestores was measured by computing its query mixes per hour (QMpH). The metric used for query-based performance evaluation is Queries per Second (QpS). QpS is computed by summing up the runtime of each query in each iteration, dividing it by the QMpH value and scaling it to seconds.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 100 200 300 400 500 600 Query No. Qp S

QpS for 10% dataset Virtuoso Sesame Jena TDB BigOWLIM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 50 100 150 200 250 300 350 400 Query No. Q pS

QpS for 50% dataset Virtuoso Sesame Jena TDB BigOWLIM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 50 100 150 200 250 Query No. Q pS

QpS for 100% dataset Virtuoso Sesame Jena TDB BigOWLIM

Figure 6.8.: Queries per Second (QpS) of DBPSB version 2 for all triplestores for 10%, 50% and 100%.

6.6.1. DBPSB Version 1

The query mixes per hour (QMpH) as shown in Figure 6.3. Please note that we used a logarithmic scale in this figure due to the high performance differences we observed. In general, Virtuoso was clearly the fastest triplestore, followed by BigOWLIM, Sesame and Jena-TDB. The highest observed ratio in QMpH between the fastest and slowest triplestore was 63.5 and it reached more than 10 000 for single queries. The scalability of stores did not vary as much as the overall performance. There was on average a linear decline in query performance with increasing dataset size.

We tested the queries that each triplestore failed to executed withing the 180s timeout and noticed that even much larger timeouts would not have been sufficient most of those queries. We did not exclude the queries completely from the overall assessment, since this would have affected a large number of the queries and adversely penalized stores, which complete queries within the time

6.6. Benchmarking Results

frame. We penalized failure queries with 180s, similar to what was done in the SP2_{Bench [Schmidt et al., 2009]. Virtuoso was the only store, which completed}

all queries in time. For Sesame and OWLIM only rarely a few particular queries timed out. Jena-TDB had always severe problems with queries 7, 10 and 20 as well as 3, 9, 12 for the larger two datasets.

The metric used for query-based performance evaluation is Queries per Second (QpS). QpS is computed by summing up the runtime of each query in each iteration,

dividing it by the QMpH value and scaling it to seconds. The QpS results for all triplestores and for the 10%, 50%, 100%, and 200% datasets are depicted in Figure 6.7.

The outliers, i.e. queries with very low QpS, will significantly affect the mean value of QpS for each store. So, we additionally calculated the geometric mean of all the QpS timings of queries for each store. The main advantage of calculating the geometric mean is that the effect of outliers is weakened. The geometric mean for all triplestores is also depicted in Figure 6.4.

Detailed results of DBPSB version 1 are indicated in tables 6.2, and 6.3.

6.6.2. DBPSB1 Results Discussion

This section consists of three parts: First, we compare the general performance of the systems under test. Then we look individual queries and the SPARQL features used within those queries in more detail to observe particular strengths and weaknesses of stores. Thereafter, we compare our results with those obtained with previous benchmarks and elucidate some of the main differences between them.

General Performance Figure 6.3 depicts the benchmark results for query mixes per hour for the four systems and dataset sizes. Virtuoso leads the field with a substantial head start of double the performance for the 10% dataset (and even quadruple for other dataset sizes) compared to the second best system (BigOWLIM). While Sesame is able to keep up with BigOWLIM for the smaller two datasets it considerably looses ground for the larger datasets. Jena-TDB can in general not deliver competitive performance with being by a factor 30-50 slower than the fastest system.

If we look at the geometric mean of all QpS results in Figure 6.4, we observe similar insights. The spreading effect is weakened, since the geometric mean reduces the effect of outliers. Still Virtuoso is the fastest system, although Sesame manages to get pretty close for the 10% dataset. This shows that most, but not all, queries are fast in Sesame for low dataset sizes. For the larger datasets, BigOWLIM is the second best system and shows promising scalability, but it is still by a factor of two slower than Virtuoso.

Virtuoso Sesame Jena-TDB BigOWLIM Query QpS SD GM QpS SD GM QpS SD GM QpS SD GM 1 261.6 45.3 250.1 466.3 136.2 428.8 330.4 155.5 258.9 63 8.9 61.9 2 450.9 59 445.6 427.7 15.5 427.4 255.1 80.4 236.5 64.7 4.1 64.4 3 82.8 16.2 81.3 348.3 97.3 320.7 1.4 1.9 0.6 55.3 14.8 52.4 4 138.1 48.9 122.6 10 60 0.2 71.3 61.1 52.7 20.6 21.4 11.6 5 67.7 10.9 67 287.9 65.3 269.6 116.1 70.5 93.3 46.6 17.5 42.1 6 60.5 17.9 58 49.4 5.5 48.2 82.5 58.1 65.2 19.2 5.1 18.5 7 28.5 8.5 26.7 207.1 79 183.7 1.6 2.5 0.5 26.5 13.9 22.7 8 52.8 67.4 24.4 65.7 112.9 23.6 134 75.1 108 18 21.9 9.1 9 22.9 3.9 22.7 226.9 86.8 197.8 0.6 0.6 0.4 48.9 9.1 47.4 10 8.1 0.4 8.1 1.4 0.4 1.4 0.1 0.04 0.1 2.8 0.1 2.8 11 176 36.2 171 289.7 80 265.8 125.3 67 104.9 51.5 12.6 49.3 12 124.8 20.2 123.1 309.9 118 264.8 1 3 0.1 59.6 12.3 57.2 13 129.3 16.5 128.3 367.2 101.4 337.3 190.1 77.3 157.9 46.7 16.4 43 14 83.1 29.8 74.1 179.2 134.6 116 96.2 69 74 25.8 20.5 16.8 15 128.1 67.6 90.3 162.6 148.5 72.6 97.3 100.2 43.9 43.1 23.9 28.4 16 121.5 23.6 118.7 249.9 67.6 236.4 28.8 30.5 19.9 39.8 15.7 35.5 17 102.2 29.8 95.7 186.2 109.5 135.8 115.6 63.6 97.6 42.8 18.9 36.1 18 182.8 33.1 179.1 0.5 0.1 0.5 178.3 52.6 168.1 23.3 15.5 18.3 19 199.8 47.9 191.1 302.5 69.9 286.7 200.8 88.9 174.5 62.2 4.7 61.9 20 18.9 4.1 18.6 221 63.3 203.7 0.1 0.3 0.02 50.6 8.1 49.5 21 483 48 480.6 459.4 16.7 459.1 289.6 92.5 224.9 66.1 1.8 66.1 22 206.1 60.6 190.2 241.4 173.8 140.6 38 86 8.4 52.8 19.3 44.7 23 140.3 27.4 137.1 354.7 116.9 315.3 23.9 39.7 12.6 64.1 6.1 63.6 24 1.2 0.7 0.8 32.4 100.2 3.7 173 88.1 146.4 0.3 0.2 0.2 25 62.8 19 59.2 259.8 120.6 209.5 110.5 99.8 68 52.3 12.5 49.7 1 264.5 76.1 242 153.5 136.3 116.9 64.8 13.6 63.2 56.7 10.1 55.5 2 22.4 3.2 22.2 137 98 109.9 87.6 87.9 68.6 27.7 12.7 25.6 3 55.4 19.7 51.8 81.1 92 52.6 0.1 0.1 0.1 29.4 18.9 23.3 4 44 62.7 17.6 0 0 0 41.7 19.5 38.4 3.9 3.3 3 5 60.3 14 58 46.6 70 24.8 40.2 32 30.6 21.5 15.4 16.9 6 14 7.3 12.3 6.2 2 5 162.5 69.8 141.3 5 2.3 4.3 7 23.1 7.9 21.8 62.2 52.3 42.3 0.2 0.5 0.02 10.7 5.8 9.2 8 32.5 46.9 5 15.6 39.4 2.7 88.2 67.3 67.8 8.2 13.9 1.7 9 20.4 5.3 19.9 42.3 35.3 29.1 2.3 3.8 0.8 27.2 15.1 22.4 10 1 0.1 1 0.2 0 0.2 4.3 3 1 0.28 0.02 0.28 11 97.7 56.9 73.9 45.2 48.1 36.3 37 7.1 36.3 27 10.1 24 12 62.3 22.2 57.6 56.3 58.2 42.4 0.02 0.02 0.02 34.1 18.9 28.7 13 105.1 52.1 91.7 97.5 60.4 86.8 105 94.3 44.9 19.9 10.4 17.9 14 53.9 40 38 20.4 17.6 12.3 42.8 13.2 40.9 7.9 10.6 3.3 15 51.9 32.7 33.4 33.5 66.1 11.5 43.1 38.9 32.8 8.6 13.7 2.2 16 106.7 25.4 103.4 87.7 75.4 64.8 73.2 48.3 63 38.7 12.9 36.7 17 33 11.2 30.6 20.9 19.9 13.3 31.9 9.5 30.5 10.3 8.5 6 18 203.9 57.9 190.4 0.1 0 0.1 57.6 9.6 56.9 29 12.7 26.4 19 106.6 53.6 93.5 46.1 38.1 39.7 26.8 10.2 25.3 33.7 16.9 29.9 20 15 4 14.5 37.7 30.6 28.9 0.01 0 0.01 20.5 16.2 15.2 21 189.1 138.3 135 105.6 87.3 84.3 50 16.2 46.4 28.1 13.4 25.7 22 109.1 43.9 90.3 29.5 41.4 14 1.2 1.5 0.7 15.7 15.9 6.1 23 81.2 31.4 74.4 78.3 97.4 47 1.5 2.4 0.4 32.4 18.2 27.9 24 0.2 0.1 0.2 0.9 0.4 0.7 53.7 39.3 43.6 0.06 0.05 0.04 25 35.6 10.9 33.3 45.2 57.4 26.4 37.1 16 32.4 11.5 7.6 8.6

Table 6.2.: Queries per second (QpS), geometric mean of query runtime in milliseconds (GM), and standard deviation of query runtime in milliseconds (SD), for the 10% dataset, and 50% dataset respectively of DBPSB

6.6. Benchmarking Results

Virtuoso Sesame Jena-TDB BigOWLIM Query QpS SD GM QpS SD GM QpS SD GM QpS SD GM 1 245.9 30.9 240.9 112.7 47.1 103.9 54.5 5.9 54.2 58.3 6.1 57.9 2 3.6 0.1 3.6 81.1 45.9 69.8 67.1 40 60 31 11.8 28.7 3 42.8 21.8 37.8 32.1 14.5 28.9 0.02 0.03 0.01 23.9 7.7 22.8 4 34 50.2 14.4 0.03 0.01 0.02 4.6 9.1 0.3 29.4 22.4 18.8 5 47.9 19.7 41.7 10.7 10.6 7.6 12.5 4.9 11.5 16.1 14.5 11.1 6 8.6 2.3 8.3 4.2 2.4 2.8 45.5 46.9 9.6 4.9 2.3 4.1 7 21 12.1 18.2 21.5 37.4 13.8 0.04 0.03 0.03 7.4 4.5 6.1 8 38.3 47.7 4.8 17.8 33.7 2.7 27.1 36.7 1.3 8.3 14.5 1 9 17.8 5.7 17.1 23.3 15.9 19.5 0.01 0 0.01 15.9 6.3 14.9 10 1 0.1 0.9 0.1 0 0.1 0.01 0 0.01 0.16 0.01 0.16 11 115.7 31.7 100.2 48.5 41.9 39.4 2.9 5.7 0.2 36.2 14.4 33.2 12 47.1 25.4 41.7 25.4 12.9 21.7 0.01 0 0.01 24.6 9.7 22.9 13 89.5 73.6 64.6 43.1 41.4 28.6 0.2 0.2 0.1 38.1 17.9 33.6 14 25.1 37.3 6 3.4 5.2 1 26.3 15.7 23 28.1 26.8 8.5 15 48.4 33.2 31.1 3.1 4.7 1.3 0.04 0.04 0.02 7.9 12.2 2.5 16 137.3 14.4 136.7 98.8 49.4 89.7 29.3 30.8 6.2 48.5 10.4 47.1 17 32.8 9.3 31.2 6.3 6.5 2.7 21.7 8.1 20.3 21.5 14.8 12.3 18 208.3 27.1 205.6 0.1 0 0 44.8 34.8 9.2 47 8.3 46.1 19 99.2 67.3 81.8 40.3 20.6 35.8 46.5 24.6 40.9 34.9 7 34.2 20 14.1 4.1 13.5 8.2 6.6 6.8 0.01 0 0.01 15.1 8.3 13.5 21 98.8 76.2 78.3 85.7 69 69.2 0.1 0.1 0.1 31.3 10.5 29.7 22 115.1 52.9 93.6 5.5 8.5 2 9.5 13.3 0.6 9.5 10.6 4.6 23 76.7 38.9 67.8 41 17.8 37.6 20.2 34.4 0.8 29.1 8.4 28.2 24 0.2 0.1 0.1 0.7 0.4 0.4 17.4 15 3.7 0.05 0.05 0.03 25 30.1 11 27.1 16.5 14.5 10.3 18.1 22.5 1.2 14.3 9.4 11 1 247.1 44.4 229.9 93.2 39.2 84.4 226.9 465.5 36.6 54.5 13.4 47.7 2 1.8 0.1 1.8 45.7 27.4 39.9 17.6 6.4 16.6 26.8 11.2 24.5 3 42.8 21.1 36.6 19.8 9.1 17.7 0.03 0.02 0.02 18.1 8.2 16.7 4 26.8 46.1 12.1 0 0 0 45.9 33.6 34.4 6.8 9 4.1 5 52.7 19.8 47.4 5.5 2.7 4.7 5.7 4.3 3 11.8 10.4 8.7 6 9.4 6.1 7.6 2.4 2 1.4 34.6 47 15.1 2.9 1.3 2.6 7 14.3 9.3 11.7 7.1 2.4 6.6 0.01 0 0.01 8.7 5.3 7 8 33.6 42.9 2.8 3.8 6.7 0.4 21.8 19.9 13.9 8.8 15.7 1 9 16.1 6 15.1 9.9 3.2 9.5 0.02 0.04 0.01 14.5 5.1 13.6 10 0.5 0 0.5 0.1 0 0 0.01 0 0.01 0.16 0.02 0.16 11 114.7 57.5 73 34.3 17.2 30.5 15.9 2.7 15.6 21 7.9 19.2 12 43.8 31.8 36.1 18 8.1 15.7 0.01 0 0.01 21.1 5.4 20.4 13 64.7 51.8 48.6 21.7 13.2 18.5 13.6 9.5 11.1 20.1 10.3 18.2 14 23.9 29.5 11.4 1.7 2.8 0.4 7.2 2.9 6.5 6.3 6.5 3.4 15 65.1 51.5 36.9 3.3 4.5 1.4 1.8 1.6 1.2 3.8 8.8 0.8 16 191.2 24.1 189.8 83.6 40.3 74.9 11.1 3.1 10.6 45.4 10.6 44 17 24.1 12.2 19.4 1.6 1.8 0.8 6.9 1.7 6.8 9.1 6.7 6.8 18 212 35.8 207.3 0 0 0 19.7 7.9 17.5 44 8.3 43.1 19 84.8 62.5 69 15.9 4.3 15.3 22.4 13 19.6 34 8.9 32.7 20 13.6 4.4 12.9 8.6 17.5 4.5 0.01 0 0.01 13.1 8.2 11.5 21 93.8 77 73.7 53.6 20.8 50 15.9 4.9 15.2 26.5 6.3 25.7 22 120 80.5 74.3 20.7 66.2 2.1 0.6 0.6 0.4 11.1 11.5 5.9 23 67.6 33.3 60.1 30.3 16.7 26.9 9.2 5.9 6.9 24.1 5.5 23.6 24 0.1 0 0.1 0.3 0.2 0.2 8.3 4.8 6.4 0.05 0.05 0.02 25 23.5 15.2 19 8.1 7.8 5.2 9.2 1.6 9 11.6 9.5 8.6

Table 6.3.: Queries per second (QpS), geometric mean of query runtime in milliseconds (GM), and standard deviation of query runtime in milliseconds (SD), for the 100% dataset, and 200% dataset of DBPSB version 1.

Scalability, Individual Queries and SPARQL Features Our first observa- tion with respect to individual performance of the triple stores is that Virtuoso demonstrates a good scaling factor on the DBPSB. When dataset size changes by factor 5 (from 10% to 50%), the performance of the triple store only degrades by factor 3.12. Further dataset increases (i.e. the doubling to the 100% and 200% datasets) result in only relatively small performance decreases by 20% and respectively 30%.

Virtuoso outperforms Sesame for all datasets. In addition, Sesame does not scale as well as Virtuoso for small dataset sizes, as its performance degrades sevenfold when the dataset size changes from 10% to 50%. However, when the dataset size doubles from the 50% to the 100% dataset and from 100% to 200% the performance degrades by just half.

The performance of Jena-TDB is the lowest of all triple stores and for all dataset sizes. The performance degradation factor of Jena-TDB is not as pronounced as that of Sesame and almost equal to that of Virtuoso when changing from the 10% to the 50% dataset. However, the performance of Jena-TDB only degrades by a factor of 2 for the transition between the 50% and 100% dataset, and reaches 0.8 between the 100% and 200% dataset, leading to a slight increase of its QMpH.

BigOWLIM is the second fastest triple store for all dataset sizes, after Virtuoso. BigOWLIM degrades with a factor of 7.2 in transition from 10% to 50% datasets, but it decreases dramatically to 1.29 with dataset size 100%, and eventually reaches 1.26 with dataset size 200%.

Due to the high diversity in the performance of different SPARQL queries, we also computed the geometric mean of the QpS values of all queries as described in the previous section and illustrated in Figure 6.4. By using the geometric mean, the resulting values are less prone to be dominated by a few outliers (slow queries) compared to standard QMpH values. This allows for some interesting observations in DBPSB by comparing Figure 6.3 and 6.4. For instance, it is evident that Virtuoso has the best QpS values for all dataset sizes.

With respect to Virtuoso, query 10 performs quite poorly. This query involves the features FILTER, DISTINCT, as well as OPTIONAL. Also, the well performing query 1 involves the DISTINCT feature. Query 3 involves a OPTIONAL resulting in worse performance. Query 2 involving a FILTER condition results in the worst performance of all of them. This indicates that using complex FILTER in conjunction with additional OPTIONAL, and DISTINCT adversely affects the overall runtime of the query.

Regarding Sesame, queries 4 and 18 are the slowest queries. Query 4 includes UNION along with several free variables, which indicates that using UNION with several free variables causes problems for Sesame. Query 18 involves the features UNION, FILTER, STR and LANG. Query 15 involves the features UNION, FILTER, and LANG, and its performance is also pretty slow, which leads to the conclusion that introducing this combination of features is difficult for Sesame. Adding the STR feature to that feature combination affects the performance dramatically and prevents the query from being successfully executed.

6.6. Benchmarking Results 1M 25M 100M No. of Triples 0 0.5 1 1.5 2 2.5 Relative performanc e

Sesame Jena TDB Virtuoso

BSBM V2 scalability 100M 200M No. of Triples 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Relative performanc e

Jena TDB Virtuoso BigOwlim

BSBM V3 scalability 10% 50% 100% 200% No. of Triples 0 0.5 1 1.5 2 2.5 3 3.5 Relative performanc e

Sesame Jena TDB Virtuoso BigOWLIM

DBPSB scalability

Figure 6.9.: Comparison of triple store scalability between BSBM V2, BSBM V3, DBPSB.

For Jena-TDB, there are several queries that timeout with large dataset sizes, but queries 10 and 20 always timeout. The problem with query 10 is already discussed with Virtuoso. Query 20 contains FILTER, OPTIONAL, UNION, and LANG. Query 2 contains FILTER only, query 3 contains OPTIONAL, and query 4 contains UNION only. All of those queries run smoothly with Jena-TDB, which indicates that using the LANG feature, along with those features affects the runtime dramatically.

For BigOWLIM, queries 10, and 15 are slow queries. Query 10 was already

In document Efficient Extraction and Query Benchmarking of Wikipedia Data (Page 78-94)