Introduction
This document aims to be a simple, results-driven tutorial for SPARQL, particularly the implementation employed by AllegroGraph. It is not a semantics document, a description of the logical interpretation, or a standards document; for these, we advise the reader to look elsewhere.
As a tutorial, this document sometimes elides details or complicating factors, entirely or until later. The educated reader should be aware of this. For example, indexing of triples is entirely ignored; for these trivial data sizes indexing is not important, but it is vital for non-trivial datasets.
To follow along with this tutorial, you should start mlisp
and evaluate the following:
(require :agraph)
(enable-!-reader)
(use-package :db.agraph)
(use-package :db.agraph.sparql)
(create-triple-store "sparqltutorial")
(register-namespace "ex" "http://example.com/")
(register-namespace "foaf" "http://xmlns.com/foaf/0.1/")
RDF and querying
RDF is an elegant formalism for describing graphs. These graphs can encode almost anything, given the right vocabulary — the model is a superset of the relational model (so you can encode conventional databases) and trees (so you can encode anything that can be expressed in XML, for example). AllegroGraph gives you several options for extracting data from these graphs.
Most basic is the API itself, working with individual triples. A logic view is offered by Prolog. SPARQL, the subject of this tutorial, more closely resembles SQL, and offers a relational, pattern-based approach to retrieving data from a store.
Bindings
Consider a graph, G. G contains triples that share objects or subjects:
john knows karen
karen knows alex
karen name "Karen"
alex name "Alex"
SPARQL's approach to selecting values is to take triples and allow them to contain variables (denoted by a ?
or $
before a string). These structures — triple patterns — match against real triples in the store, or inferred triples if you wish to use a reasoner. Every time a triple pattern matches against a triple, it produces a binding for each variable.
For example, the triple pattern
john knows ?y
produces one binding for ?y
: karen.
The pattern
?x knows ?y
produces a richer table of bindings:
| x | y |
=================
1 | john | karen |
-------------------
2 | karen | alex |
Each row in this table is a result for the query.
Multiple triple patterns
Variables can occur in multiple patterns that together comprise a query. Patterns that overlap in variables narrow down the results, while those that do not expand them.
To extend the earlier example:
?x knows ?y
?y name ?name
produces the following results:
| x | y | name |
===========================
1 | john | karen | "Karen" |
-----------------------------
2 | karen | alex | "Alex" |
Adding an additional triple to the store:
alex name "Alexander"
yields the following:
| x | y | name |
================================
1 | john | karen | "Karen" |
----------------------------------
2 | karen | alex | "Alex" |
----------------------------------
3 | karen | alex | "Alexander" |
This should tell you something interesting: a row exists in the results for every possible substitution of values into the query that would yield a set of triples that exist in the graph. Each row can contain only one binding, so Alex's two names fork the results.
SPARQL syntax
SPARQL borrows Turtle's syntax for triple patterns. A variable is a string starting with a ?
or a $
, but otherwise things are much the same. The above query pattern, borrowing the FOAF vocabulary and assigning it the prefix foaf
, would be written as
?x foaf:knows ?y .
?y foaf:name ?name .
You'll be shown more syntax as you progress through this tutorial.
Every triple pattern in SPARQL lives inside a graph pattern (as can other graph patterns!). Graph patterns are denoted by curly brackets, so our query would look like
{
?x foaf:knows ?y .
?y foaf:name ?name .
}
Verbs and variables
SPARQL doesn't just do results querying — it can also ask questions, describe resources, and construct new graphs. It also makes sense to be able to specify which columns to select from the result table. These two pieces of information — that SPARQL needs a verb and a list of columns — allow us to build our first valid SPARQL query, including the verb and a single-element list of variables:
SELECT ?name WHERE {
?x <http://xmlns.com/foaf/0.1/knows> ?y .
?y <http://xmlns.com/foaf/0.1/name> ?name .
}
As you can see, a full URI can be specified in angle brackets.
If you evaluate the following in a running AllegroGraph REPL:
(add-triple !ex:john !foaf:knows !ex:karen)
(add-triple !ex:karen !foaf:knows !ex:alex)
(add-triple !ex:karen !foaf:name !"Karen")
(add-triple !ex:alex !foaf:name !"Alex")
(run-sparql "
SELECT ?name WHERE {
?x <http://xmlns.com/foaf/0.1/knows> ?y .
?y <http://xmlns.com/foaf/0.1/name> ?name .
}")
You'll see the SPARQL XML results format for two bindings printed to the console:
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="name" />
</head>
<results ordered="false" distinct="false">
<result>
<binding name="name"><literal>Karen</literal></binding>
</result>
<result>
<binding name="name"><literal>Alex</literal></binding>
</result>
</results>
</sparql>
t
:select
(?name)
Congratulations! Your first SPARQL query.
The three returned values are the output (ignore this for now: the real output got printed out as XML), the SPARQL verb (this was a SELECT
query, so you get :select
back), and the list of variables that were selected. You might find these useful later.
Patterns
SPARQL, being a fully-fledged query language, doesn't just have basic graph patterns. You can also:
- Nest patterns:
{ ?x ?y "Jim" . { ?y rdfs:subPropertyOf ex:someProperty } }
- Mark patterns as optional
- Filter results based on built-in or custom predicates
- Do basic flexible queries using union patterns.
Basic and optional patterns
All triples and all other basic patterns inside a basic pattern must match. So, in the nested pattern above, ?y
would bind to properties that were direct subproperties of ex:someProperty
.
This is not the case for optional patterns. An optional pattern will not cause a result to fail if it does not match with the current bindings. This manifests itself as an empty (unbound) cell in the results table.
Example:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email WHERE {
?x foaf:knows ?y .
?y foaf:name ?name .
OPTIONAL { ?y foaf:mbox ?email }
}
If you run this on the data so far, you'll get two results in the output with no bindings shown for ?email
. (In former versions of SPARQL XML, <unbound/>
elements would be included.) Try taking out the word "OPTIONAL
": you get no results.
Filters
Matching and comparing data is a very common operation in a query language. SPARQL has a full suite of comparisons, mostly supported by twinql. A common one is regex testing:
(run-sparql "
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name WHERE {
?x foaf:knows ?y .
?y foaf:name ?name .
FILTER regex(?name, '^K.*')
}")
… returning only "Karen".
You can even define your own, referencing them by URI, though this is a more involved topic, left for later discussion.
Filters and optionals
It's important to note the interaction between filter patterns and optional patterns. Remember that an optional pattern contributes to the results if it matches, and leaves the results unchanged if it does not. An optional pattern can do some interesting things to the results when combined with a filter.
Try the following:
(add-triple !ex:book1 !ex:title !"Cheap Book")
(add-triple !ex:book1 !ex:price !"30"^^xsd:integer)
(add-triple !ex:book2 !ex:title !"Expensive Book")
(add-triple !ex:book2 !ex:price !"90"^^xsd:integer)
(run-sparql "
PREFIX ex: <http://example.com/>
SELECT ?title ?price WHERE {
?x ex:title ?title .
OPTIONAL {
?x ex:price ?price .
FILTER ( ?price < 40 )
}
}")
You should get something like:
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="title"/>
<variable name="price"/>
</head>
<results ordered="false" distinct="false">
<result>
<binding name="title">
<literal>Cheap Book</literal>
</binding>
<binding name="price">
<literal datatype="http://www.w3.org/2001/XMLSchema#integer">30</literal>
</binding>
</result>
<result>
<binding name="title">
<literal>Expensive Book</literal>
</binding>
</result>
</results>
</sparql>
t
:select
(?title ?price)
The price for the expensive book is not returned, because it wasn't under 40. If you move the price triple pattern to outside the optional, you get the expensive book's price in the results. If you add another book without a listed price, the optional will also fail to match, so expensive books and books with no price are indistinguishable.
Remember this when you write queries!
Combining filters
You can combine filters with boolean operators, parentheses, and so on:
(run-sparql "
PREFIX ex: <http://example.com/>
SELECT ?title ?price WHERE {
?x ex:title ?title .
OPTIONAL {
?x ex:price ?price .
}
FILTER ( bound(?price) && ?price < 40 )
}")
… matching only books where the OPTIONAL
matches, providing a price, and the price is less than 40. On the example data, this returns one result: the cheap book and its price, 30.
UNION
As well as using optional patterns to extend data, SPARQL allows you to bind variables using alternatives. Using UNION
you can specify a number of graph patterns, separated by the UNION
keyword, that can each contribute to the query result. The union pattern matches if any of its graph patterns match, and all of them have a chance to contribute. Try the following:
(add-triple !ex:a !ex:b !ex:c)
(add-triple !ex:a !ex:d !ex:e)
(run-sparql "
SELECT ?third {
{ <http://example.com/a> <http://example.com/b> ?third }
UNION
{ <http://example.com/a> <http://example.com/d> ?third }
}")
You'll see this:
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="third" />
</head>
<results ordered="false" distinct="false">
<result>
<binding name="third"><uri>http://example.com/c</uri></binding>
</result>
<result>
<binding name="third"><uri>http://example.com/e</uri></binding>
</result>
</results>
</sparql>
t
:select
(?third)
That the union pattern can match as a whole, but the sub-patterns do not have to contain the same variables, can be useful; the sub-patterns can contain optionals to contribute additional information, or bind to differently-named variables to track which branch was applied.
Ordering and slicing results
SPARQL supports four post-processing operations on a results set.
DISTINCT
and REDUCED
A SELECT
query can optionally be specified to return unique results for each row. Query patterns often return duplicate bindings, and implementations must not eliminate duplicates unless explicitly instructed.
Simply add the DISTINCT
keyword:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?x ?y WHERE {
?x foaf:knows [ foaf:knows ?y ] .
}
Note that this removal of duplicates occurs after the results are refined down to the specified variable list.
In SPARQL, the ... [ ... ]
syntax represents anonymous blank nodes. The full details can be found in the SPARQL reference (here and here) but the general idea is that:
[]
specifies a blank node,[ ?p ?o ]
is the same as[] ?p ?o
and is equivalent to writing out a blank node in longhand as in_:b25 ?p ?o
,- The blank node can be used in multiple triple-patterns. For example, this pattern
[ :p "v" ] :q "w"
is equivalent to these two combined patterns:_:b23 :p "v" . _:b23 :q "w" .
and this pattern :x :q [ :p "v" ] .
is the same as
:x :q _:b19 .
_:b19 :p "v" .
See the SPARQL reference guide for more details and other useful abbreviations.
If you do not need duplicates to be removed, but you do not need the redundant entries, either — which would be the case if you are relying on counts to be correct, for example — then you can specify REDUCED
instead of DISTINCT
. This allows AllegroGraph to discard duplicate values if it's advantageous to do so.
ORDER BY
Ordering directives can be appended to a SELECT
query. These allow you to impose a sorted order on a results set.
Naturally, the results of a query can be ordered by any combination of variables in the results, in ascending or descending order. Furthermore, multiple sorting criteria can be specified to break ties.
Criteria can be:
Simple variables:
ORDER BY ?firstName
Arbitrary functions, named by URI:
ORDER BY ex:calculateImportance(?person)
Nested SPARQL expressions, surrounded by parentheses, including arbitrary boolean operations (e.g., && and ||; see this section of the SPARQL specification for more):
ORDER BY (REGEX('Mr.*', ?title) && BOUND(?employer)) ASC(?surname)
This expression will sort the results into two partitions: one containing people whose title begins with "Mr" and are employed, and its counterpart. Each category will then be sorted alphabetically by surname.
These criteria can be optionally annotated with ascending/descending (ascending by default):
ORDER BY DESC(?age) ASC(?lastName)
The values generated by the criteria (e.g., the bound value of a variable) are compared according to a strict set of rules to yield an ordering. If a criterion does not yield an ordering, the next criterion is applied, and so on until an ordering is achieved. If an ordering is never achieved, then the order is unspecified.
LIMIT
and OFFSET
Once solutions are ordered, it makes sense to be able to return 'slices' of a sequence of results. LIMIT
is an upper bound on the number of results returned. E.g., LIMIT 5
will return no more than 5 results for the query. OFFSET
causes results to be discarded up to that offset. For example:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name WHERE {
?person foaf:name ?name .
}
ORDER BY ?name
LIMIT 10
OFFSET 20
implements the usual ten-per-page style of results, starting on page 3 — results 21–30 inclusive. If the number of results is smaller than OFFSET
, no results are returned. If LIMIT
is 0, no results are returned.
It is technically possible to apply LIMIT
and OFFSET
to unordered results, but this is largely pointless, as the values returned in each slice are unpredictable.
Other verbs
The most common use of SPARQL is to return results bindings from queries — SELECT
. As previously mentioned, though, there are three other things it can do.
DESCRIBE
In many situations you simply do not have enough information to properly query a store for information about a resource -- you might not know which properties it has, for example. DESCRIBE allows you to provide a list of resources or variables that you wish to be described; the variables can be bound by an implicit SELECT
query.
DESCRIBE <http://example.com/fish> ?x WHERE {
?x ?y <http://example.com/fish>
}
This query asks for a description of fish, and any resource directly related to fish. twinql's implementation of DESCRIBE uses Concise Bounded Descriptions as a formalism for descriptions; informally, this is the smallest useful section of outward-facing graph around each resource.
DESCRIBE returns a collection of triples, not a set of bindings, and these are ordinarily serialized in RDF/XML.
Example:
(run-sparql "
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.com/>
DESCRIBE ex:karen ?friend {
ex:karen foaf:knows ?friend .
}" :rdf-format :ntriples)
prints
<http://example.com/alex> <http://xmlns.com/foaf/0.1/name> "Alex" .
<http://example.com/karen> <http://xmlns.com/foaf/0.1/knows> <http://example.com/alex> .
<http://example.com/karen> <http://xmlns.com/foaf/0.1/name> "Karen" .
CONSTRUCT
Much of the time, your purpose for querying an RDF store is to construct a new set of triples. For example, you can do a limited amount of inference this way:
(add-triple !ex:bill !ex:mother !ex:doris)
(add-triple !ex:doris !ex:brother !ex:billsuncle)
(add-triple !ex:fred !ex:brother !ex:billsotheruncle)
(add-triple !ex:bill !ex:father !ex:fred)
(run-sparql "
PREFIX ex: <http://example.com/>
SELECT ?x ?uncle WHERE {
{ ?x ex:mother [ ex:brother ?uncle ] }
UNION
{ ?x ex:father [ ex:brother ?uncle ] }
}")
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="x"/>
<variable name="uncle"/>
</head>
<results ordered="false" distinct="false">
<result>
<binding name="x">
<uri>http://example.com/bill</uri>
</binding>
<binding name="uncle">
<uri>http://example.com/billsuncle</uri>
</binding>
</result>
<result>
<binding name="x">
<uri>http://example.com/bill</uri>
</binding>
<binding name="uncle">
<uri>http://example.com/billsotheruncle</uri>
</binding>
</result>
</results>
</sparql>
t
:select
(?x ?uncle)
… perhaps substituting ?x
and ?uncle
into an ex:uncle
triple, which we add to the store:
(loop for (person uncle) in
(run-sparql "
PREFIX ex: <http://example.com/>
SELECT ?x ?uncle WHERE {
{ ?x ex:mother [ ex:brother ?uncle ] }
UNION
{ ?x ex:father [ ex:brother ?uncle ] }
}"
:results-format :lists) do
(add-triple person !ex:uncle uncle :g !<http://example.com/inferred>))
CONSTRUCT
allows you to yield triples directly from the query in one of several RDF serialization formats:
(progn
(run-sparql "
PREFIX ex: <http://example.com/>
CONSTRUCT {
?x ex:uncle ?uncle
}
WHERE {
{ ?x ex:mother [ ex:brother ?uncle ] }
UNION
{ ?x ex:father [ ex:brother ?uncle ] }
}"
:rdf-format :ntriples)
nil)
<http://example.com/bill> <http://example.com/uncle> <http://example.com/billsotheruncle> .
<http://example.com/bill> <http://example.com/uncle> <http://example.com/billsuncle> .
nil
though it bears mentioning that these triples have not been added to the store. It is trivial to do so if desired.
What we haven't covered
There are a lot of things! SPARQL and twinql can also:
- Restrict queries to a set of graphs:
FROM
,FROM NAMED
- Match graphs:
GRAPH
- Serialize to many different formats
- Ask boolean queries:
ASK
- Parse queries into s-expressions: see db.agraph.sparql:parse-sparql
Take a look at the reference for more details.