• No results found

SPARQL Protocol and Query Language

2.4 Semantic Web Technologies

2.4.3 SPARQL Protocol and Query Language

The "SPARQL Protocol and Query language" (SPARQL) is a family of W3C specifications for enabling Web systems to access, query and modify RDF datasets. SPARQL includes:

– A query language, based on a graph matching approach. – An update language, to manipulate RDF Datasets.

– A service description vocabulary, in order to describe SPARQL services in RDF. – A federated querying system.

– Query result syntax for tabular views, in XML and JSON formats.

– Entailment regimes, defining how SPARQL query execution engines can produce inferred knowledge by exploiting other specifications such as RDFS, OWL or RIF (in what follows we assume queries to be interpreted within a Simple Entailment regime).

– A query protocol that describes the behaviour of a so-called SPARQL endpoint. A SPARQL endpoint is a Web API meant to interact with an RDF Dataset.

– A graph store protocol, which is a CRUD API to enable direct access to RDF graphs following the REST principles.

SPARQL is used several times in our work. In what follows we focus on the basic characteristics of the query language, therefore only covering the way RDF data can be read with SPARQL. The reader can consult the W3C documents for the other parts of the specifications and for details of the language we are leaving out [The W3C SPARQL Working Group (2013)].

7RDFa: https://www.w3.org/TR/2015/NOTE-rdfa-primer-20150317/

2.4. SEMANTIC WEB TECHNOLOGIES 41 The SPARQL language is based on a graph matching approach. At the core is the notion of triple pattern. A triple pattern is like an RDF triple where the subject, predicate or object can be variable terms. A triple pattern matches a portion of an RDF graph where there are RDF terms from that graph that can substitute the variable terms in the given position. The resulting RDF graph will, therefore, be equivalent to the matched subgraph. Taking as example the RDF graph in Listing 2.1, the triple pattern

?x rdf:type foaf:Person

would actually match the triple

per:0fe416343d163a372e32910118bdbe76 rdf:type foaf:Person

while the triple pattern

<http://data.open.ac.uk/organization/kmi> ?y ?z

would not have matches, as there is no triple with the RDF term

<http://data.open.ac.uk/organization/kmi>in the subject position.

A set of triple patterns is called a Basic Graph Pattern (BGP). A BGP will have a solution sequence according to the way it matches the RDF graph, therefore there can be zero, one, or multiple solutions to a given query. For example, the pattern

?x rdf:type foaf:Person . ?x rdfs:label ?y

will have a single solution:

?x=per:0fe416343d163a372e32910118bdbe76 ?y="Enrico Daga"@en

while the pattern

?x rdf:type foaf:Person . ?x foaf:workInfoHomepage ?y

will have actually two solutions:

?x=per:0fe416343d163a372e32910118bdbe76 ?y=<http://www.open.ac.uk/people/ed4565> ?x=per:0fe416343d163a372e32910118bdbe76

The last two BGPs are examples of a conjunction of triple patterns. It is worth noting that all the variables used in graph patterns must be bound in every solution. Similarly, it is possible to design queries on RDF datasets by extending the BGP notion to also include named graphs. Multigraph datasets can be represented in N-Quads syntax [Carothers (2014)]. N-Quads follows the same principle of N-Triples, but includes an additional IRI at the beginning of the tuple, identifying the dataset containing the triple. Considering the RDF Dataset in Listing 2.2:

Listing 2.2: An example RDF Dataset in N-Quads syntax.

<http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/0fe416343d163a372e32910118bdbe76> <http://www.w3.org/1999/02/22- rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

<http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/0fe416343d163a372e32910118bdbe76> org:memberOf <http://data.open .ac.uk/organization/kmi> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://www.w3.org/1999/02/22- rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://xmlns.com/foaf/0.1/ topic_interest> <http://data.open.ac.uk/topic/web_science> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://xmlns.com/foaf/0.1/ topic_interest> <http://data.open.ac.uk/theme/ software_engineering_and_design> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://xmlns.com/foaf/0.1/ topic_interest> <http://data.open.ac.uk/topic/semantics_and_ontologies> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://xmlns.com/foaf/0.1/ weblog> <http://www.enridaga.net> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://xmlns.com/foaf/0.1/ workInfoHomepage> <http://www.open.ac.uk/people/ed4565> . <http://data.open.ac.uk/context/people/profiles> <http://data.open.ac.uk/ person/d9734c68df46924452ff25a4174ab758> <http://xmlns.com/foaf/0.1/ workInfoHomepage> <http://kmi.open.ac.uk/people/member/enrico-daga> . <http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/42047> < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ ontology/bibo/AcademicArticle> . <http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/42047> < http://www.w3.org/2000/01/rdf-schema#label> "Dealing with diversity in a smart-city datahub" <http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/42047> < http://purl.org/dc/elements/1.1/creator> <http://data.open.ac.uk/person /0fe416343d163a372e32910118bdbe76> . <http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/42047> < http://purl.org/dc/elements/1.1/creator> <http://data.open.ac.uk/person

2.4. SEMANTIC WEB TECHNOLOGIES 43 /0e5d4257051894026ea74b7ed55557e7> . <http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/41070> < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ ontology/bibo/Book> . <http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/41070> < http://www.w3.org/2000/01/rdf-schema#label> "Educational Technology Topic Guide" .

<http://data.open.ac.uk/context/oro> <http://data.open.ac.uk/oro/41070> < http://purl.org/dc/elements/1.1/creator> <http://data.open.ac.uk/person /d9734c68df46924452ff25a4174ab758> .

and the graph pattern

prefix [...] graph <http://data.open.ac.uk/context/people/profiles> { ?x org:memberOf <http://data.open.ac.uk/organization/kmi> } . graph <http://data.open.ac.uk/context/oro> { ?y dc:creator ?x }

will have the following solution

?x=per:0fe416343d163a372e32910118bdbe76 ?y=<http://data.open.ac.uk/oro/42047>

The SPARQL query language provides four methods for handling query solutions or query types: ASK, SELECT, CONSTRUCT, and DESCRIBE. A ASK type of query can return a boolean value, being true when the graph pattern has at least one solution, false otherwise. For example, the following query over the RDF Dataset in Listing 2.2 will return true:

prefix [...] ASK {

?x rdf:type foaf:Person . ?x foaf:workInfoHomepage [] }

In the previous example, the symbol [] represents a blank node. (In graph patterns, blank nodes act as existential identifiers, therefore any RDF term would match it.)

SELECTqueries return the query solutions projected as tabular data. The following query:

prefix [...] SELECT ?y WHERE {

?x foaf:topic_interest ?y }

will have the following matching solutions:

?x=per:0fe416343d163a372e32910118bdbe76 ?y=<http://data.open.ac.uk/topic/web_science> ?x=per:0fe416343d163a372e32910118bdbe76 ?y=<http://data.open.ac.uk/theme/software_engineering_and_design> ?x=per:0fe416343d163a372e32910118bdbe76 ?y=<http://data.open.ac.uk/topic/semantics_and_ontologies>

but only one variable will be projected to the result set:

--- | y | --- |<http://data.open.ac.uk/topic/web_science> | |<http://data.open.ac.uk/theme/software_engineering_and_design>| |<http://data.open.ac.uk/topic/semantics_and_ontologies> | ---

The SELECT query supports also aggregate functions (COUNT, SUM, AVG, MIN, MAX, ...), for example:

prefix ...

SELECT ?x (COUNT(?y) AS ?topics) WHERE { ?x rdf:type foaf:Person . ?x foaf:topic_interest ?y } GROUP BY ?x returning: --- | x | topics | --- |per:0fe416343d163a372e32910118bdbe76 | 3 | ---

SPARQL can also be used to generate a new graph out of the variable projections of a graph pattern towards an RDF dataset. This is done using the CONSTRUCT query type:

prefix ...

2.4. SEMANTIC WEB TECHNOLOGIES 45

CONSTRUCT {

?y rdf:type dbowl:topic .

?org rdf:type org:Organization . ?org foaf:topic_interest ?y } WHERE {

?x foaf:topic_interest ?y . ?x org:memberOf ?org

}

that, executed over the RDF dataset in Listing 2.2 will produce the following RDF graph:

<http://data.open.ac.uk/topic/web_science> rdf:type dbowl:topic .

<http://data.open.ac.uk/topic/semantics_and_ontologies> rdf:type dbowl: topic .

<http://data.open.ac.uk/theme/software_engineering_and_design> rdf:type dbowl:topic .

<http://data.open.ac.uk/organization/kmi> rdf:type org:Organization . <http://data.open.ac.uk/organization/kmi> foaf:topic_interest <http://data

.open.ac.uk/topic/web_science> .

<http://data.open.ac.uk/organization/kmi> foaf:topic_interest <http://data .open.ac.uk/topic/semantics_and_ontologies> .

<http://data.open.ac.uk/organization/kmi> foaf:topic_interest <http://data .open.ac.uk/theme/software_engineering_and_design> .

Finally, the DESCRIBE query type requests to return the RDF subgraph about the given terms or projected variables. Examples of DESCRIBE queries are the following:

DESCRIBE <http://data.open.ac.uk/topic/web_science>

DESCRIBE ?org WHERE {

?x foaf:topic_interest <http://data.open.ac.uk/topic/web_science> . ?x org:memberOf ?org

}

The returning graphs would include all the triples having the declared term or projected values as subject (however, the semantics of DESRIBE may vary between implementations, some of them returning also triples having the term in the object position).

The SPARQL query language includes many features not mentioned here. These include RDF term manipulation functions, string manipulation functions, basic mathematical functions, boolean and conditional operators. Moreover, Basic Graph Patterns can be extended with sophisticated constructs allowing fine-grained matching criteria using operators such as FILTER( ... ) or OPTIONAL {...}, including existential qualifiers - FILTER EXISTS { ... }, union of query solutions - { ... } UNION { ... }, negative matching - FILTER NOT EXISTS { ... }. SPARQL also supports graph traversal constructs (so-called property paths). Query

nesting is another useful feature, and it is the basis for supporting query federation through the SERVICE clause. Some implementations also extend SPARQL with additional features, for example adding operators and datatypes for handling geospatial data [Battle and Kolas (2012)]. For more details on the language, we refer to the W3C specification [The W3C SPARQL Working Group (2013)].