Introduction

This document is a simple, results-driven tutorial for SPARQL in AllegroGraph using the Lisp API. See SPARQL Tutorial using AGWebView for a similar tutorial using WebView.

This is not a semantics document, or a general description of SPARQL, or a standards document; for these, we advise the reader to look elsewhere, starting with the SPARQL Reference.

Note that this document sometimes omits details or complicating factors, entirely or until later. The tutorial uses the Lisp client for AllegroGraph.

Start an AllegroGraph server and run an AllegroGraph client in the Lisp enviroment. See the installation guide and the Lisp Quick-Start guide for more details. In brief, you should start mlisp, start the IDE if desired with (require :ide) (ide:start-ide). Then evaluate:

  (require :agraph)  
  (enable-!-reader)  
  (use-package :db.agraph)  
  (use-package :db.agraph.sparql)  
  (create-triple-store "sparqltutorial")  
  (register-namespace "ex" "http://example.com/")  
  (register-namespace "foaf" "http://xmlns.com/foaf/0.1/") 

RDF and querying

RDF is an elegant formalism for describing graphs. These graphs can encode almost anything, given the right vocabulary — the model is a superset of the relational model (so you can encode conventional databases) and trees (so you can encode anything that can be expressed in XML, for example). AllegroGraph gives you several options for extracting data from these graphs.

Most basic is the API itself, working with individual triples. A logic view is offered by Prolog. SPARQL, the subject of this tutorial, more closely resembles SQL, and offers a relational, pattern-based approach to retrieving data from a store.

Bindings

Consider a graph, G. G contains triples that share objects or subjects:

  john  knows  karen  
  karen knows  alex  
  karen name   "Karen"  
  alex  name   "Alex" 

SPARQL's approach to selecting values is to take triples and allow them to contain variables (denoted by a ? or $ before a string). These structures — triple patterns — match against real triples in the store, or inferred triples if you wish to use a reasoner. Every time a triple pattern matches against a triple, it produces a binding for each variable.

For example, the triple pattern

  john knows ?y 

produces one binding for ?y: karen.

The pattern

  ?x knows ?y 

produces a richer table of bindings:

  |   x   |   y   |  
  =================  
1 | john  | karen |  
-------------------  
2 | karen |  alex | 

Each row in this table is a result for the query.

Multiple triple patterns

Variables can occur in multiple patterns that together comprise a query. Patterns that overlap in variables narrow down the results, while those that do not expand them.

To extend the earlier example:

  ?x knows ?y  
  ?y name  ?name 

produces the following results:

  |   x   |   y   |  name   |  
  ===========================  
1 | john  | karen | "Karen" |  
-----------------------------  
2 | karen |  alex | "Alex"  |    

Adding an additional triple to the store:

  alex name "Alexander" 

yields the following:

  |   x   |   y   |  name        |  
  ================================  
1 | john  | karen | "Karen"      |  
----------------------------------  
2 | karen |  alex | "Alex"       |  
----------------------------------  
3 | karen |  alex | "Alexander"  | 

This should tell you something interesting: a row exists in the results for every possible substitution of values into the query that would yield a set of triples that exist in the graph. Each row can contain only one binding, so Alex's two names fork the results.

SPARQL syntax

SPARQL borrows Turtle's syntax for triple patterns (Turtle is described here). A variable is a string starting with a ? or a $, but otherwise things are much the same. The above query pattern, borrowing the FOAF vocabulary and assigning it the prefix foaf, would be written as

  ?x foaf:knows ?y .  
  ?y foaf:name  ?name . 

You'll be shown more syntax as you progress through this tutorial.

Every triple pattern in SPARQL lives inside a graph pattern (as can other graph patterns!). Graph patterns are denoted by curly brackets, so our query would look like

{  
  ?x foaf:knows ?y .  
  ?y foaf:name  ?name .  
} 

Verbs and variables

SPARQL doesn't just do results querying — it can also ask questions, describe resources, and construct new graphs. It also makes sense to be able to specify which columns to select from the result table. So, here is our first valid SPARQL query, which includes a verb (SELECT) and a single-element list of variables (?name):

SELECT ?name WHERE {  
  ?x <http://xmlns.com/foaf/0.1/knows> ?y .  
  ?y <http://xmlns.com/foaf/0.1/name>  ?name .  
} 

As you can see, a full URI can be specified in angle brackets.

If you evaluate the following in a running AllegroGraph REPL:

(add-triple !ex:john !foaf:knows !ex:karen)  
(add-triple !ex:karen !foaf:knows !ex:alex)  
(add-triple !ex:karen !foaf:name !"Karen")  
(add-triple !ex:alex !foaf:name !"Alex")  
 
(run-sparql "  
SELECT ?name WHERE {  
  ?x <http://xmlns.com/foaf/0.1/knows> ?y .  
  ?y <http://xmlns.com/foaf/0.1/name>  ?name .  
}")    

You'll see the SPARQL XML results format for two bindings printed to the console:

<?xml version="1.0"?>  
<sparql xmlns="http://www.w3.org/2005/sparql-results#">  
  <head>  
    <variable name="name" />   
  </head>  
  <results ordered="false" distinct="false">  
    <result>  
      <binding name="name"><literal>Karen</literal></binding>  
    </result>  
    <result>  
      <binding name="name"><literal>Alex</literal></binding>  
    </result>  
  </results>  
</sparql>  
t  
:select  
(?name)    

Congratulations! Your first SPARQL query.

The three returned values are the output (ignore this for now: the real output got printed out as XML), the SPARQL verb (this was a SELECT query, so you get :select back), and the list of variables that were selected. You might find these useful later.

Patterns

SPARQL, being a fully-fledged query language, doesn't just have basic graph patterns. You can also:

Basic and optional patterns

All triples and all other basic patterns inside a basic pattern must match. So, in the nested pattern above, ?y would bind to properties that were direct subproperties of ex:someProperty.

This is not the case for optional patterns. An optional pattern will not cause a result to fail if it does not match with the current bindings. This manifests itself as an empty (unbound) cell in the results table.

Example:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT ?name ?email WHERE {  
  ?x foaf:knows ?y .  
  ?y foaf:name ?name .  
  OPTIONAL { ?y foaf:mbox ?email }  
} 

If you run this on the data so far, you'll get two results in the output with no bindings shown for ?email. (In former versions of SPARQL XML, <unbound/> elements would be included.) Try taking out the word "OPTIONAL": you get no results.

Filters

Matching and comparing data is a very common operation in a query language. SPARQL has a full suite of comparisons. A common one is regex testing:

(run-sparql "  
PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT ?name WHERE {  
  ?x foaf:knows ?y .  
  ?y foaf:name ?name .  
  FILTER regex(?name, '^K.*')  
}")    

… returning only "Karen".

You can even define your own, referencing them by URI, though this is a more involved topic, left for later discussion.

Filters and optionals

It's important to note the interaction between filter patterns and optional patterns. Remember that an optional pattern contributes to the results if it matches, and leaves the results unchanged if it does not. An optional pattern can do some interesting things to the results when combined with a filter.

Try the following:

(add-triple !ex:book1 !ex:title !"Cheap Book")  
(add-triple !ex:book1 !ex:price !"30"^^xsd:integer)  
(add-triple !ex:book2 !ex:title !"Expensive Book")  
(add-triple !ex:book2 !ex:price !"90"^^xsd:integer)  
 
(run-sparql "  
PREFIX ex: <http://example.com/>  
SELECT ?title ?price WHERE {  
  ?x ex:title ?title .  
  OPTIONAL {  
    ?x ex:price ?price .  
    FILTER ( ?price < 40 )  
  }  
}") 

You should get something like:

<?xml version="1.0"?>  
<sparql xmlns="http://www.w3.org/2005/sparql-results#">  
  <head>  
    <variable name="title"/>  
    <variable name="price"/>  
  </head>  
  <results ordered="false" distinct="false">  
    <result>  
      <binding name="title">  
        <literal>Cheap Book</literal>  
      </binding>  
      <binding name="price">  
        <literal datatype="http://www.w3.org/2001/XMLSchema#integer">30</literal>  
      </binding>  
    </result>  
    <result>  
      <binding name="title">  
        <literal>Expensive Book</literal>  
      </binding>  
    </result>  
  </results>  
</sparql>  
t  
:select  
(?title ?price)    

The price for the expensive book is not returned, because it wasn't under 40. If you move the price triple pattern to outside the optional, you get the expensive book's price in the results. If you add another book without a listed price, the optional will also fail to match, so expensive books and books with no price are indistinguishable.

Remember this when you write queries!

Combining filters

You can combine filters with boolean operators, parentheses, and so on:

(run-sparql "  
PREFIX ex: <http://example.com/>  
SELECT ?title ?price WHERE {  
  ?x ex:title ?title .  
  OPTIONAL {  
    ?x ex:price ?price .  
  }  
  FILTER ( bound(?price) && ?price < 40 )  
}") 

… matching only books where the OPTIONAL matches, providing a price, and the price is less than 40. On the example data, this returns one result: the cheap book and its price, 30.

UNION

As well as using optional patterns to extend data, SPARQL allows you to bind variables using alternatives. Using UNION you can specify a number of graph patterns, separated by the UNION keyword, that can each contribute to the query result. The union pattern matches if any of its graph patterns match, and all of them have a chance to contribute. Try the following:

(add-triple !ex:a !ex:b !ex:c)  
(add-triple !ex:a !ex:d !ex:e)  
(run-sparql "  
SELECT ?third {  
  { <http://example.com/a> <http://example.com/b> ?third }  
  UNION  
  { <http://example.com/a> <http://example.com/d> ?third }  
}") 

You'll see this:

<?xml version="1.0"?>  
<sparql xmlns="http://www.w3.org/2005/sparql-results#">  
  <head>  
    <variable name="third" />   
  </head>  
  <results ordered="false" distinct="false">  
    <result>  
      <binding name="third"><uri>http://example.com/c</uri></binding>  
    </result>  
    <result>  
      <binding name="third"><uri>http://example.com/e</uri></binding>  
    </result>  
  </results>  
</sparql>  
t  
:select  
(?third)    

That the union pattern can match as a whole, but the sub-patterns do not have to contain the same variables, can be useful; the sub-patterns can contain optionals to contribute additional information, or bind to differently-named variables to track which branch was applied.

Ordering and slicing results

SPARQL supports four post-processing operations on a results set.

DISTINCT and REDUCED

A SELECT query can optionally be specified to return unique results for each row. Query patterns often return duplicate bindings, and implementations must not eliminate duplicates unless explicitly instructed.

Simply add the DISTINCT keyword:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT DISTINCT ?x ?y WHERE {  
  ?x foaf:knows [ foaf:knows ?y ] .  
}    

Note that this removal of duplicates occurs after the results are refined down to the specified variable list.

In SPARQL, the ... [ ... ] syntax represents anonymous blank nodes. The full details can be found in the SPARQL reference (here and here) but the general idea is that:

See the SPARQL reference guide for more details and other useful abbreviations.

If you do not need duplicates to be removed, but you do not need the redundant entries, either — which would be the case if you are relying on counts to be correct, for example — then you can specify REDUCED instead of DISTINCT. This allows AllegroGraph to discard duplicate values if it's advantageous to do so.

ORDER BY

Ordering directives can be appended to a SELECT query. These allow you to impose a sorted order on a results set.

Naturally, the results of a query can be ordered by any combination of variables in the results, in ascending or descending order. Furthermore, multiple sorting criteria can be specified to break ties.

Criteria can be:

This expression will sort the results into two partitions: one containing people whose title begins with "Mr" and are employed, and its counterpart. Each category will then be sorted alphabetically by surname.

These criteria can be optionally annotated with ascending/descending (ascending by default):

ORDER BY DESC(?age) ASC(?lastName) 

The values generated by the criteria (e.g., the bound value of a variable) are compared according to a strict set of rules to yield an ordering. If a criterion does not yield an ordering, the next criterion is applied, and so on until an ordering is achieved. If an ordering is never achieved, then the order is unspecified.

LIMIT and OFFSET

Once solutions are ordered, it makes sense to be able to return 'slices' of a sequence of results. LIMIT is an upper bound on the number of results returned. E.g., LIMIT 5 will return no more than 5 results for the query. OFFSET causes results to be discarded up to that offset. For example:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT ?name WHERE {  
  ?person foaf:name ?name .  
}  
ORDER BY ?name  
LIMIT 10  
OFFSET 20 

implements the usual ten-per-page style of results, starting on page 3 — results 21–30 inclusive. If the number of results is smaller than OFFSET, no results are returned. If LIMIT is 0, no results are returned.

It is technically possible to apply LIMIT and OFFSET to unordered results, but this is largely pointless, as the values returned in each slice are unpredictable.

Other verbs

The most common use of SPARQL is to return results bindings from queries — SELECT. As previously mentioned, though, there are three other things it can do.

DESCRIBE

In many situations you simply do not have enough information to properly query a store for information about a resource -- you might not know which properties it has, for example. DESCRIBE allows you to provide a list of resources or variables that you wish to be described; the variables can be bound by an implicit SELECT query.

DESCRIBE <http://example.com/fish> ?x WHERE {  
  ?x ?y <http://example.com/fish>  
} 

This query asks for a description of fish, and any resource directly related to fish. AllegroGraph's implementation of DESCRIBE uses Concise Bounded Descriptions as a formalism for descriptions; informally, this is the smallest useful section of outward-facing graph around each resource.

DESCRIBE returns a collection of triples, not a set of bindings, and these are ordinarily serialized in RDF/XML.

Example:

(run-sparql "  
PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
PREFIX ex: <http://example.com/>  
DESCRIBE ex:karen ?friend {  
  ex:karen foaf:knows ?friend .  
}" :rdf-format :ntriples) 

prints

<http://example.com/alex> <http://xmlns.com/foaf/0.1/name> "Alex" .  
<http://example.com/karen> <http://xmlns.com/foaf/0.1/knows> <http://example.com/alex> .  
<http://example.com/karen> <http://xmlns.com/foaf/0.1/name> "Karen" . 

CONSTRUCT

Much of the time, your purpose for querying an RDF store is to construct a new set of triples. For example, you can do a limited amount of inference this way:

(add-triple !ex:bill !ex:mother !ex:doris)  
(add-triple !ex:doris !ex:brother !ex:billsuncle)  
(add-triple !ex:fred !ex:brother !ex:billsotheruncle)  
(add-triple !ex:bill !ex:father !ex:fred)  
 
(run-sparql "  
PREFIX ex: <http://example.com/>  
SELECT * WHERE {  
  { ?x ex:mother [ ex:brother ?uncle ] }  
  UNION  
  { ?x ex:father [ ex:brother ?uncle ] }  
}")  
 
<?xml version="1.0"?>  
<sparql xmlns="http://www.w3.org/2005/sparql-results#">  
  <head>  
    <variable name="x"/>  
    <variable name="uncle"/>  
  </head>  
  <results ordered="false" distinct="false">  
    <result>  
      <binding name="x">  
        <uri>http://example.com/bill</uri>  
      </binding>  
      <binding name="uncle">  
        <uri>http://example.com/billsuncle</uri>  
      </binding>  
    </result>  
    <result>  
      <binding name="x">  
        <uri>http://example.com/bill</uri>  
      </binding>  
      <binding name="uncle">  
        <uri>http://example.com/billsotheruncle</uri>  
      </binding>  
    </result>  
  </results>  
</sparql>  
t  
:select  
(?x ?uncle) 

… perhaps substituting ?x and ?uncle into an ex:uncle triple, which we add to the store 1 :

(multiple-value-bind (bindings _ column-names)  
    (sparql:run-sparql "  
        PREFIX ex: <http://example.com/>  
        SELECT * WHERE {  
          { ?x ex:mother [ ex:brother ?uncle ] }  
          UNION  
          { ?x ex:father [ ex:brother ?uncle ] }  
        }"  
       :results-format :arrays)  
  (let ((x-offset (position '?x column-names))  
        (uncle-offset (position '?uncle column-names)))  
    (loop for binding in bindings   
       for person = (aref binding x-offset)  
       and uncle = (aref binding uncle-offset) do  
       (add-triple person !ex:uncle uncle :g !<http://example.com/inferred>)))) 

CONSTRUCT allows you to yield triples directly from the query:

(progn  
    (run-sparql "  
      PREFIX ex: <http://example.com/>  
      CONSTRUCT {  
        ?x ex:uncle ?uncle  
      }  
      WHERE {  
        { ?x ex:mother [ ex:brother ?uncle ] }  
        UNION  
        { ?x ex:father [ ex:brother ?uncle ] }  
      }"  
      :rdf-format :ntriples)  
    nil)  
 
<http://example.com/bill> <http://example.com/uncle> <http://example.com/billsotheruncle> .  
<http://example.com/bill> <http://example.com/uncle> <http://example.com/billsuncle> .  
nil 

though it bears mentioning that these triples have not been added to the store. It is trivial to do so if desired.

What the tutorial hasn't covered

There are a lot of things! SPARQL can also:

Take a look at the reference for more details.


Footnotes

  1. Note that we use the third return value from run-sparql to associate the variable names with their position in returned arrays. This added complexity isn't necessary in this example but the technique ensures that the code will keep working regardless of the order in which the results are returned.