SPARQL Tutorial | AllegroGraph 4.7

Introduction

This is not a semantics document, or a general description of SPARQL, or a standards document; for these, we advise the reader to look elsewhere, starting with the SPARQL Reference.

Note that this document sometimes omits details or complicating factors, entirely or until later. The tutorial uses the AGWebView client for AllegroGraph (see WebView).

To run this tutorial, start an AllegroGraph server and log into AllegroGraph WebView with your browser. See Server Installation and WebView for details.

Once you have logged in, create a sparqltutorial repository (type sparqltutorial into the Name field under Create a repository and click the Create button). The result is shown in the first picture (you may have additional repositories, of course). In this and later illustrations, red text, arrows, etc. are added and will not be seen on your AGWebView display.

sparql-tut-1

You need to define three namespaces, for prefixes ex, foaf. and xsd. First click on the sparqltutorial repository name to get to the page for that repository. To add namespaces, choose new from the Query menu (shown in the illustration):

sparql-tut-2

That displays an Edit Query area.

sparql-tut-3

Click on add a namespace (shown to the right in the illustration, which is cropped; it is actually in the middle of the browser display). An editable text box is displayed asking for the prefix and the URI. The illustration shows the first prefix/URI pair (prefix ex, URI http://example.com/). Clicking the OK button on the right will define the namespace. We also want to define two more namespaces (prefix foaf to http://xmlns.com/foaf/0.1/, and prefix xsd to http://www.w3.org/2001/XMLSchema#).

sparql-tut-4

RDF and querying

RDF is an elegant formalism for describing graphs. These graphs can encode almost anything, given the right vocabulary — the model is a superset of the relational model (so you can encode conventional databases) and trees (so you can encode anything that can be expressed in XML, for example). AllegroGraph gives you several options for extracting data from these graphs.

Most basic is the API itself, working with individual triples. A logic view is offered by Prolog. SPARQL, the subject of this tutorial, more closely resembles SQL, and offers a relational, pattern-based approach to retrieving data from a store.

Bindings

Consider a graph, G. G contains triples that share objects or subjects:

  john  knows  karen  
  karen knows  alex  
  karen name   "Karen"  
  alex  name   "Alex"

SPARQL's approach to selecting values is to take triples and allow them to contain variables (denoted by a ? or $ before a string). These structures — triple patterns — match against real triples in the store, or inferred triples if you wish to use a reasoner. Every time a triple pattern matches against a triple, it produces a binding for each variable.

For example, the triple pattern

  john knows ?y

produces one binding for ?y: karen.

The pattern

  ?x knows ?y

produces a richer table of bindings:

  |   x   |   y   |  
  =================  
1 | john  | karen |  
-------------------  
2 | karen |  alex |

Each row in this table is a result for the query.

Multiple triple patterns

Variables can occur in multiple patterns that together comprise a query. Patterns that overlap in variables narrow down the results, while those that do not expand them.

To extend the earlier example:

  ?x knows ?y  
  ?y name  ?name

produces the following results:

  |   x   |   y   |  name   |  
  ===========================  
1 | john  | karen | "Karen" |  
-----------------------------  
2 | karen |  alex | "Alex"  |

Adding an additional triple to the store:

  alex name "Alexander"

yields the following:

  |   x   |   y   |  name        |  
  ================================  
1 | john  | karen | "Karen"      |  
----------------------------------  
2 | karen |  alex | "Alex"       |  
----------------------------------  
3 | karen |  alex | "Alexander"  |

This should tell you something interesting: a row exists in the results for every possible substitution of values into the query that would yield a set of triples that exist in the graph. Each row can contain only one binding, so Alex's two names fork the results.

SPARQL syntax

SPARQL borrows Turtle's syntax for triple patterns (Turtle is described here). A variable is a string starting with a ? or a $, but otherwise things are much the same. The above query pattern, borrowing the FOAF vocabulary and assigning it the prefix foaf, would be written as

  ?x foaf:knows ?y .  
  ?y foaf:name  ?name .

You'll be shown more syntax as you progress through this tutorial.

Every triple pattern in SPARQL lives inside a graph pattern (as can other graph patterns!). Graph patterns are denoted by curly brackets, so our query would look like

{  
  ?x foaf:knows ?y .  
  ?y foaf:name  ?name .  
}

Verbs and variables

SPARQL doesn't just do results querying — it can also ask questions, describe resources, and construct new graphs. One can also specify which columns to select from the result table. This SPARQL query includes a verb and a single-element list of variables:

SELECT ?name WHERE {  
  ?x <http://xmlns.com/foaf/0.1/knows> ?y .  
  ?y <http://xmlns.com/foaf/0.1/name>  ?name .  
}

As you can see, a full URI can be specified in angle brackets.

Now we want to add some data, starting with four triples. We use SPARQL's INSERT DATA command, which gets pasted into the query area, as shown in the illustration. (SPARQL commands are case-insensitive but we usually use uppercase).

INSERT DATA { ex:john foaf:knows ex:karen .   
	  ex:karen foaf:knows ex:alex .   
	  ex:karen foaf:name "Karen" .   
	  ex:alex foaf:name "Alex" .   
        }

Once the statement is entered, click on the Execute button.

sparql-tut-5

The system will display No results but the data is now available.

Now that we have some data in our repository, we can start making queries. For, example, who is known by at least one other? The SPARQL querying command is SELECT (meaning select items which satisfy the statements in the SELECT body). The SELECT statement to ask ' who is known by at least one other?' is:

SELECT ?name WHERE {   
  ?x <http://xmlns.com/foaf/0.1/knows> ?y .   
  ?y <http://xmlns.com/foaf/0.1/name>  ?name .   
}

x? and y? are variables. This statement says 'select all names of items where the item appears as the object in a triple where <http://xmlns.com/foaf/0.1/knows> is the predicate'. Note we use the complete URI of the predicates (instead of the abbreviated foaf:knows). We are just illustrating that either can be used at any time.

Back in the browser window, choose New from the Query menu and enter the query. Click the Execute button to send the query to AllegroGraph, which displays the results in a new field below the Query field. The results are "Karen" and "Alex" because we have triples

ex:john foaf:knows ex:karen   
ex:karen foaf:knows ex:alex    
ex:karen foaf:name "Karen"    
ex:alex foaf:name "Alex"

so both ex:karen and ex:alex are objects of triples which have the predicate foaf:knows, and "Alex" is the name of ex:alex, "Karen" the name of ex:karen.

sparql-tut-6

Notice the Save button at the bottom of the Query field. If you type a name into the text box next that button (like query1) and click on Save, the query will be saved and can be retrieved with the Saved option on the Queries menu. (Recent queries can also be retrieved with the Recent option on that menu, but those will not survive restarting the server while saved queries will).

Download in the Result field allows you to save a result in a selected format. The result will be saved to a file. Here is the file saved when clicking on Download with the format Sparql XML results format:

<?xml version="1.0"?>  
<sparql xmlns="http://www.w3.org/2005/sparql-results#">  
  <head>  
<variable name="name"/>  
  </head>  
  <results>  
<result>  
  <binding name="name">  
    <literal>Karen</literal>  
  </binding>  
</result>  
<result>  
  <binding name="name">  
    <literal>Alex</literal>  
  </binding>  
</result>  
  </results>  
</sparql>

Congratulations! Your first SPARQL query.

We will not be showing as many pictures of the browser for the rest of this tutorial. Instead, we will often just say where queries, statements, etc. should go and what buttons you should click.

Patterns

SPARQL, being a fully-fledged query language, doesn't just have basic graph patterns. You can also:

Nest patterns:

{  
  ?x ?y "Jim" .  
  { ?y rdfs:subPropertyOf ex:someProperty }  
}

Mark patterns as optional
Filter results based on built-in or custom predicates
Find alternative matches using UNION patterns.

Basic and optional patterns

All triples and all other basic patterns inside a basic pattern must match. So, in the nested pattern above, ?y would bind to properties that were direct subproperties of ex:someProperty.

This is not the case for optional patterns. An optional pattern will not cause a result to fail if it does not match with the current bindings. This manifests itself as an empty (unbound) cell in the results table.

Example:

SELECT ?name ?email WHERE {  
  ?x foaf:knows ?y .  
  ?y foaf:name ?name .  
  OPTIONAL { ?y foaf:mbox ?email }  
}

If you run this on the data so far, you'll get two results in the output with no bindings shown for ?email.

sparql-tut-7

Try taking out the word "OPTIONAL": you get no results.

sparql-tut-8

Filters

Matching and comparing data is a very common operation in a query language. SPARQL has a full suite of comparisons. A common one is regex testing:

SELECT ?name WHERE {  
  ?x foaf:knows ?y .  
  ?y foaf:name ?name .  
  FILTER regex(?name, '^K.*')  
}

… returning only "Karen".

You can even define your own, referencing them by URI, though this is a more involved topic, left for later discussion.

Filters and optionals

It's important to note the interaction between filter patterns and optional patterns. Remember that an optional pattern contributes to the results if it matches, and leaves the results unchanged if it does not. An optional pattern can do some interesting things to the results when combined with a filter.

Try the following:

INSERT DATA { ex:book1 ex:title "Cheap Book" .   
	  ex:book1 ex:price "30"^^xsd:integer .   
	  ex:book2 ex:title "Expensive Book" .   
	  ex:book2 foaf:name "90"^^xsd:integer .   
        }

Note we are using the xsd prefix we defined with the other namespaces above. Call this data data2. Now lets ask a query (we put in a redundant PREFIX statement for a namespace already defined, just to show how you can define prefixes on the fly):

PREFIX ex: <http://example.com/>  
SELECT ?title ?price WHERE {  
  ?x ex:title ?title .  
  OPTIONAL {  
    ?x ex:price ?price .  
    FILTER ( ?price < 40 )  
  }  
}

You should get something like:

sparql-tut-9

The price for the expensive book is not returned, because it wasn't under 40. If you move the price triple pattern to outside the optional, you get the expensive book's price in the results. If you add another book without a listed price, the optional will also fail to match, so expensive books and books with no price are indistinguishable.

Remember this when you write queries!

Combining filters

You can combine filters with boolean operators, parentheses, and so on:

SELECT ?title ?price WHERE {  
  ?x ex:title ?title .  
  OPTIONAL {  
    ?x ex:price ?price .  
  }  
  FILTER ( bound(?price) && ?price < 40 )  
}

… matching only books where the OPTIONAL matches, providing a price, and the price is less than 40. On the example data, this returns one result: the cheap book and its price, 30.

UNION

As well as using optional patterns to extend data, SPARQL allows you to bind variables using alternatives. Using UNION you can specify a number of graph patterns, separated by the UNION keyword, that can each contribute to the query result. The union pattern matches if any of its graph patterns match, and all of them have a chance to contribute. Try the following:

INSERT DATA { ex:a ex:b ex:c .   
	  ex:a ex:d ex:e .  
        }

And try this query:

SELECT ?third {  
  { <http://example.com/a> <http://example.com/b> ?third }  
  UNION  
  { <http://example.com/a> <http://example.com/d> ?third }  
}

You'll see ex:c and ex:e under the heading third.

That the union pattern can match as a whole, but the sub-patterns do not have to contain the same variables, can be useful; the sub-patterns can contain optionals to contribute additional information, or bind to differently-named variables to track which branch was applied.

Ordering and slicing results

SPARQL can post-process result sets.

DISTINCT and REDUCED

A SELECT query can optionally be specified to return unique results for each row. Query patterns often return duplicate bindings, and implementations must not eliminate duplicates unless explicitly instructed. (See delete-duplicate-triples for information on duplicate deletion.)

Simply add the DISTINCT keyword:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT DISTINCT ?x ?y WHERE {  
  ?x foaf:knows [ foaf:knows ?y ] .  
}

Note that this removal of duplicates occurs after the results are refined down to the specified variable list.

In SPARQL, the ... [ ... ] syntax represents anonymous blank nodes. The full details can be found in the SPARQL reference (here and here) but the general idea is that:

[] specifies a blank node,
[ ?p ?o ] is the same as [] ?p ?o and is equivalent to writing out a blank node in longhand as in _:b25 ?p ?o,
The blank node can be used in multiple triple-patterns. For example, this pattern [ :p "v" ] :q "w" is equivalent to these two combined patterns:
```
 _:b23 :p "v" .  
 _:b23 :q "w" . 
```

and this pattern :x :q [ :p "v" ] . is the same as

:x  :q _:b19 .  
_:b19 :p "v" .

See the SPARQL reference guide for more details and other useful abbreviations.

If you do not need duplicates to be removed, but you do not need the redundant entries, either — which would be the case if you are relying on counts to be correct, for example — then you can specify REDUCED instead of DISTINCT. This allows AllegroGraph to discard duplicate values if it's advantageous to do so.

ORDER BY

Ordering directives can be appended to a SELECT query. These allow you to impose a sorted order on a results set.

Naturally, the results of a query can be ordered by any combination of variables in the results, in ascending or descending order. Furthermore, multiple sorting criteria can be specified to break ties.

Criteria can be:

Simple variables:
```
ORDER BY ?firstName 
```

Arbitrary functions, named by URI:

ORDER BY ex:calculateImportance(?person)

Nested SPARQL expressions, surrounded by parentheses, including arbitrary boolean operations (e.g., && and ||; see this section of the SPARQL specification for more):
```
ORDER BY (REGEX('Mr.*', ?title) && BOUND(?employer)) ASC(?surname) 
```

This expression will sort the results into two partitions: one containing people whose title begins with "Mr" and are employed, and its counterpart. Each category will then be sorted alphabetically by surname.

These criteria can be optionally annotated with ascending/descending (ascending by default):

ORDER BY DESC(?age) ASC(?lastName)

The values generated by the criteria (e.g., the bound value of a variable) are compared according to a strict set of rules to yield an ordering. If a criterion does not yield an ordering, the next criterion is applied, and so on until an ordering is achieved. If an ordering is never achieved, then the order is unspecified.

LIMIT and OFFSET

Once solutions are ordered, it makes sense to be able to return 'slices' of a sequence of results. LIMIT is an upper bound on the number of results returned. E.g., LIMIT 5 will return no more than 5 results for the query. OFFSET n causes the first n results to be discarded. For example:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT ?name WHERE {  
  ?person foaf:name ?name .  
}  
ORDER BY ?name  
LIMIT 10  
OFFSET 20

implements the usual ten-per-page style of results, starting on page 3 — results 21–30 inclusive. If the number of results is smaller than OFFSET, no results are returned. If LIMIT is 0, no results are returned.

It is technically possible to apply LIMIT and OFFSET to unordered results, but this is largely pointless, as the values returned in each slice are unpredictable.

Other verbs

The most common use of SPARQL is to return results bindings from queries — SELECT. As previously mentioned, though, there are three other things it can do.

DESCRIBE

In many situations you simply do not have enough information to properly query a store for information about a resource -- you might not know which properties it has, for example. DESCRIBE allows you to provide a list of resources or variables that you wish to be described; the variables can be bound by an implicit SELECT query.

DESCRIBE <http://example.com/fish> ?x WHERE {  
  ?x ?y <http://example.com/fish>  
}

This query asks for a description of fish, and any resource directly related to fish. The implementation of DESCRIBE uses Concise Bounded Descriptions as a formalism for descriptions; informally, this is the smallest useful section of outward-facing graph around each resource.

DESCRIBE returns a collection of triples, not a set of bindings, and these are ordinarily serialized in RDF/XML.

Example:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
PREFIX ex: <http://example.com/>  
DESCRIBE ex:karen ?friend {  
  ex:karen foaf:knows ?friend .  
}

displays

sparql-tut-10

If you download as N-Triples (as suggested in the Download widget), you get a file containing (slightly reformatted):

<http://example.com/karen> <http://xmlns.com/foaf/0.1/knows> <http://example.com/alex> .  
<http://example.com/karen> <http://xmlns.com/foaf/0.1/name> "Karen" .  
<http://example.com/alex> <http://xmlns.com/foaf/0.1/name> "Alex" .

CONSTRUCT

Much of the time, your purpose for querying an RDF store is to construct a new set of triples. For example, you can do a limited amount of inference this way:

INSERT DATA { ex:bill ex:mother ex:doris .   
	  ex:doris ex:brother ex:billsuncle .  
              ex:fred ex:brother ex:billsotheruncle .  
              ex:bill ex:father ex:fred .  
        }  
 
 
 
PREFIX ex: <http://example.com/>  
SELECT * WHERE {  
  { ?x ex:mother [ ex:brother ?uncle ] }  
  UNION  
  { ?x ex:father [ ex:brother ?uncle ] }  
}

Which produces the following

sparql-tut-10

CONSTRUCT allows you to yield triples directly from the query:

PREFIX ex: <http://example.com/>  
      CONSTRUCT {  
  ?x ex:uncle ?uncle  
}  
WHERE {  
  { ?x ex:mother [ ex:brother ?uncle ] }  
  UNION  
  { ?x ex:father [ ex:brother ?uncle ] }  
}

produces these triples:

sparql-tut-10

These triples have not been added to the store. That can be done with a similar query, replaces CONSTRUCT with INSERT:

PREFIX ex: <http://example.com/>  
      INSERT {  
  ?x ex:uncle ?uncle  
}  
WHERE {  
  { ?x ex:mother [ ex:brother ?uncle ] }  
  UNION  
  { ?x ex:father [ ex:brother ?uncle ] }  
}

What the tutorial hasn't covered

There are a lot of things! SPARQL can also:

Restrict queries to a set of graphs: FROM, FROM NAMED
Match graphs: GRAPH
Serialize to many different formats
Perform aggregation on result sets
Ask boolean queries: ASK
Using AllegroGraph's geospatial support with SPARQL (see Geospatial support in SPARQL queries for details)

Take a look at the reference for more details.

You can use other clients with SPARQL. See for example SPARQL Tutorial using the Lisp Client.

AllegroGraph 4.7 SPARQL tutorial

Table of Contents

Introduction

RDF and querying

Bindings

Multiple triple patterns

SPARQL syntax

Verbs and variables

Patterns

Basic and optional patterns

Filters

UNION

Ordering and slicing results

DISTINCT and REDUCED

ORDER BY

LIMIT and OFFSET

Other verbs

DESCRIBE

CONSTRUCT

What the tutorial hasn't covered

Introduction

RDF and querying

Bindings

Multiple triple patterns

SPARQL syntax

Verbs and variables

Patterns

Basic and optional patterns

Filters

Filters and optionals

Combining filters

UNION

Ordering and slicing results

DISTINCT and REDUCED

ORDER BY

LIMIT and OFFSET

Other verbs

DESCRIBE

CONSTRUCT

What the tutorial hasn't covered