Introduction
This document describes twinql, AllegroGraph's SPARQL implementation. Each of the following functions are exported from the db.agraph.sparql
package. This package has sparql
and twinql
as nicknames.
For notes on twinql's conformance to the W3C specification please see this document.
You might also want to look at the reference guide for AllegroGraph and the AllegroGraph tutorial.
Conceptually, twinql has three layers:
- a parser from the textual SPARQL surface syntax to s-expressions;
- a query builder and planner that prepares the query for execution;
- and an executor that runs the plan against a store to produce results.
Currently, input and output from each of these layers is limited (for example, the query plan is not available to user code, but parsed output is). This will change in a future release.
Queries can be executed by an extensible set of query engines (currently the engine first available in AllegroGraph 2, and another based on the SPARQL algebra). The engine is specified by the :engine
keyword; acceptable values are returned by get-sparql-query-engines.
The default engine, used if the :engine
argument is not supplied, is specified by the sparql:*default-sparql-query-engine*
special variable. As of AllegroGraph 3.2, its value defaults to :algebra
.
Valid result formats
There are three possible outputs from a SPARQL query:
- a yes/no answer, in response to an
ASK
query; - a list of bindings, in response to a
SELECT
query; or - a new RDF graph, in response to a
CONSTRUCT
orDESCRIBE
query.
twinql provides a number of different ways to serialize these results to a stream, provided as keyword symbols to the query functions. The results-format
argument controls how ASK
and SELECT
query results are serialized; some possible formats are :sparql-xml
, which serializes the result into the SPARQL XML result format, and :sparql-json
, which uses the JSON format.
For CONSTRUCT
and DESCRIBE
, the value of the rdf-format
argument applies.
The default formats are :sparql-xml
and :rdf/xml
respectively. Providing an unrecognized format will signal an error.
You can find out which formats are allowed for a particular verb by using get-allowed-results-formats
and get-allowed-rdf-formats
.
Exported functions
parse-sparql
takes a string and parses it into an s-expression format. A parse error will result in a sparql-parse-error
being raised.
This function is useful for three reasons: validation and inspection of queries, manual manipulation of query expressions without text processing, and performing parsing at a more convenient time than during query execution.
You do not need an open triple store in order to parse a query.
default-base
and default-prefixes
allow you to provide BASE
and PREFIX
arguments to the parser without inserting them textually into the query.
default-base
should be nil
or a string, and default-prefixes
can either be a hash-table
(string prefix to string expansion) or a list similar to db.agraph:*standard-namespaces*
.
parse-sparql
returns the s-expression representation of the query string.
run-sparql
takes a SPARQL query as input and returns bindings or new triples as output.
Since AllegroGraph 3.0 it is a convenient wrapper for the db-run-sparql methods specialized on particular database classes and query engines. You might consider using those methods directly to gain more control over the execution of your queries.
You should consider specifying an engine
argument in your invocations of run-sparql; the choice of default execution engine is not guaranteed to remain the same in future releases.
Allowable values for engine
are keyword symbols returned by get-sparql-query-engines.
The precise arguments supplied to run-sparql vary according to the query engine. These are the typical arguments expected by the default engines.
SELECT
and ASK
query results will be presented according to the value provided for results-format
, whilst the RDF output of DESCRIBE
and CONSTRUCT
will be serialized according to rdf-format
. Both of these arguments take keyword values.
If the format is programmatic (that is, it is intended to return values rather than print a representation; :arrays
is an example) then any results will be returned as the first value, and nothing will be printed on output-stream
.
The
query
can be a string, which will be parsed byparse-sparql
, or an s-expression as produced byparse-sparql
. If you expect to run a query many times, you can avoid some parser overhead by parsing your query once and callingrun-sparql
with the parsed representation.If
query
is a string, thendefault-base
anddefault-prefixes
are provided to parse-sparql to use when parsing the query. See the documentation for that function for details. Parser errors signaled within parse-sparql will be propagated onwards byrun-sparql
.Results or new triples will be serialized to
output-stream
. If a programmatic format is chosen for output, the stream is irrelevant. An error will be signaled ifoutput-stream
is not astream
,t
, ornil
.If
limit
,offset
,from
, orfrom-named
are provided, they override the corresponding values specified in the query string itself. AsFROM
andFROM NAMED
together define a dataset, and the SPARQL Protocol specification states that a dataset specified in the protocol (in this case, the programmatic API) overrides that in the query, if eitherfrom
orfrom-named
are non-nil
then any dataset specifications in the query are ignored. You can specify that the contents of the query are to be partially overridden by providingt
as the value of one of these arguments. This is interpreted as 'use the contents of the query'.from
andfrom-named
should be lists of URIs: future-parts, UPIs, or strings.default-dataset-behavior
controls how the query engine builds the dataset environment ifFROM
orFROM NAMED
are not provided. Valid options are:all
(ignore graphs; include all triples) and:default
(include only the store's default graph).default-graph-uris
allows you to specify a list of resources which, when encountered in the SPARQL dataset specification, are to be treated as the default graph of the store. Each resource can be a resource UPI, resource future-part, or a URI string. For example, specifying '("http://example.com/default") will cause a query featuring
FROM
to execute against the union of the contents of the named graph <http://example.com/baz>
and the store's default graph, as determined by (default-graph-upi db)
.
with-variables
should be an alist of variable names and values. The variable names can be strings (which will be interned in the package in which the query is parsed) or symbols (which should be interned in the package in which the query is to be, or was, parsed). The variable names can include or omit a leading '?'. Note that a query literal in code might be parsed at compile time. Using strings is the most reliable method for naming variables.
Before the query is executed, the variables named after symbols will be bound to the provided values.
This allows you to use variables in your query which are externally imposed, or generated by other queries. The format expected by with-variables
is the same as that used for each element of the list returned by the :alists
results-format
.
db
(*db*
by default) specifies the triple store against which queries should run.destination-db
(db
by default) specifies the triple store against which Update modifications should take place. This is primarily of use whendb
is a read-only wrapper around a writable store, such as when reasoning has been applied.If
verbosep
is non-nil
, status information is written to*sparql-log-stream*
(*standard-output*
by default).
Three additional extensions are provided for your use.
If
extendedp
is true (or*use-extended-sparql-verbs-p*
is true, and the argument omitted) some additional SPARQL verbs become available.SUM
,AVERAGE
,MEDIAN
,STATS
,CORRELATION
, andCOUNT
can all be used in place ofSELECT
. These verbs are still experimental and undocumented, and can only be used with the:allegrograph-2
query engine.extendedp
also controls other syntactic extensions in SPARQL queries, such asGEO
syntax. Extensions are enabled by default in all versions of AllegroGraph after 3.2.If
memoizep
is true (or*build-filter-memoizes-p*
is true, and the argument omitted) calls to SPARQL query functions (such asSTR
,fn:matches
, and extension functions) will be memoized for the duration of the query. For most queries this will yield speed increases whenFILTER
orORDER BY
are used, at the cost of additional memory consumption (and consequent GC activity). For some queries (those where repetition of function calls is rare) the cost of memoization will outweigh the benefits. In large queries which call SPARQL functions on many values, the size of the memos can grow large.
Memoization also requires that your extension functions do not depend on side-effects. The standard library is correct in this regard.
- In some circumstances you can achieve substantial speed increases by sharing your memos between queries. Create a normal
eql
hash-table with(make-hash-table)
, passing it as the value of thememos
argument torun-sparql
. This hash-table will gradually fill with memos for each used query function.
If you wish to globally enable memoization, set the variables as follows:
(progn
(setf *build-filter-memoizes-p* t)
(setf *sparql-sop-memos* (make-hash-table)))
Be aware that the size of *sparql-sop-memos*
could grow very large indeed. You might consider using a weak hash-table, or periodically discarding the contents of the hash-table.
load-function
is a function with signature(uri db &optional type)
ornil
. If it is a function, it is called once for eachFROM
andFROM NAMED
parameter making up the dataset of the query. The execution of the query commences once each parameter has been processed. Thetype
argument is either:from
or:from-named
, and theuri
argument is a part (ordinarily afuture-part
) naming a URI. The default value is taken from*dataset-load-function*
. You can use this hook function to implement loading of RDF before the query is executed.
The values returned by run-sparql
are dependent on the verb used. The first value is typically disregarded in the case of results being written to output-stream
. If output-stream
is nil
, the first value will be the results collected into a string (similar to the way in which cl:format
operates).
The second value is the query verb: one of :select
, :ask
, :construct
, or :describe
. Other values are possible in extended mode.
The third value, for SELECT
queries only, is a list of variables. This list can be used as a key into the values returned by the :arrays
and lists
results formats, amongst other things.
Individual results formats are permitted to return additional values.
permitted-verbs
is a keyword, either:all
or:read-only
. This defaults to:all
, and will permit any kind of SPARQL or SPARQL/Update query. Use:read-only
to allow onlySELECT
,ASK
,DESCRIBE
, andCONSTRUCT
queries. Note that you must also enable extended mode (using:extendedp :update
) to use SPARQL/Update operations.
results-format
to a query with the given verb
. if verb
is not provided, the intersection of :ask
and :select
(the two permitted values) is returned. With AllegroGraph 3.0, an additional engine
argument is available. In a similar manner to verb
, omitting this restricts the returned values to those that apply to all built-in query engines.
Returns a list of keyword symbols that are valid when applied as values of rdf-format
to a query with the given verb
. if verb
is not provided, the intersection of :construct
and :describe
(the two permitted values) is returned. With AllegroGraph 3.0, an additional engine
argument is available. In a similar manner to verb
, omitting this restricts the returned values to those that apply to all built-in query engines.
Examples:
- Get RDF formats for the default query engine that apply to both
CONSTRUCT
andDESCRIBE
queries.(get-allowed-rdf-formats nil :allegrograph-2)
- Get formats for
CONSTRUCT
queries executed by the algebra query engine.(get-allowed-rdf-formats :construct :algebra)
engine
argument to run-sparql or db-run-sparql.
In AllegroGraph 3.0, to prepare for future extension to different query engines and databases, the db-run-sparql generic function was introduced. You can continue to use run-sparql in your code.
A generic function to dispatch query execution across different SPARQL engines and database types.
N.B., if you request a results-format
of :cursor
, you should yield bindings from it within a (sparql:with-stable-xquery-environment)
form, or avoid the use of filter functions that rely on the implicit environment (such as fn:currentDate
).
Serialized results formats are provided with a managed environment; only returned cursors need this.
Extension functions
SPARQL allows for query engines to associate extension functions with URIs, and call them from within queries.
You can define your own URI functions in twinql through defurifun
, or associate existing functions with a URI through associate-function-with-uri
. defurifun
does some manipulation of the arguments, so you should use it whenever possible.
uri
, which is a string or a valid part, and the provided function
, which is a symbol or a function. If cache-now-p
, and function
is a symbol, its function binding is stored instead of the symbol itself.
stream
(*standard-output*
by default).
name
, and associate it with uri
as with associate-function-with-uri
. args
is not evaluated, exactly as with defun
.
Here's an example: a function that will do an HTTP HEAD request against the provided URL, returning the HTTP status code as an integer literal, or 0 if there's a problem.
(The built-in functions are quite robust, so a Lisp integer will be treated as an RDF literal with data type xsd:integer
.)
(defurifun ex-head-request !<http://example.com/fn/head> (uri)
(or
(when uri
(ignore-errors
(format t "~&Performing HTTP HEAD request on <~A>...~%"
(upi->value uri))
(second
(multiple-value-list
(net.aserve.client:do-http-request (upi->value uri)
:method :head)))))
0))
You can use this function in a query exactly as you would a built-in function.
Using this data as an example:
<http://ex.com/a> <http://ex.com/foo> "200"^^<http://www.w3.org/2001/XMLSchema#integer> .
we can run a query like so:
sparql(54): (run-sparql
"
PREFIX f: <http://example.com/fn/>
SELECT ?x {
?x <http://ex.com/foo> ?y .
FILTER ( ?y = f:head("http://franz.com\") )
}"
:results-format :count)
which produces this output:
Performing HTTP HEAD request on <http://franz.com>...
1
:select
(?x)
… we know, then, that franz.com
is returning a 200 status code.
Note that these filter functions can be called an arbitrary number of times during the execution of a query. It's not a good idea to actually perform expensive operations like HTTP requests in your queries.
SELECT
bindings and ASK
results
run-sparql
allows you programmatic access to results in a number of ways.
Any of the following results-formats are suitable as arguments to SELECT
or ASK
queries:
:sparql-xml
, which serializes the results as XML tooutput-stream
.:sparql-json
, which does the same in the JSON encoding.:sparql-ttl
, which does the same in the SPARQL results Turtle encoding.:table
, a simple debugging format that concisely prints the results in a table. See *sparql-table-width*.
The following results-formats are suitable as arguments to SELECT
queries:
:arrays
, which returns a list of arrays, each with one entry for each results variable.:lists
, which is the same but with lists instead of arrays.:hashes
, which returns a list ofeq
hash-tables. Each hash table maps from results variables (interned as symbols; see below) to values.:alists
, which is the same but with association lists instead of hash tables.:count
, which returns the number of results rows.
The following results-formats are suitable as arguments to ASK
queries:
:boolean
, which returnst
ornil
for true and false respectively.
Returning triples from CONSTRUCT
and DESCRIBE
queries
Any of the following rdf-format
s are suitable as arguments to CONSTRUCT
or DESCRIBE
queries:
:ntriples
, which serializes the triples as N-Triples tooutput-stream
.:rdf/xml
, which does the same in RDF/XML.:rdf-n3
, which does the same in the Turtle subset of Notation-3.
The following rdf-format
is suitable for DESCRIBE
queries:
:hash
, which returns aupi-hash-table
from nodes to their Concise Bounded Descriptions. The union of the values in theupi-hash-table
makes up the output of theDESCRIBE
query. This method is unsupported and internal.:triples
, which returns a list of AllegroGraph triples.
The following rdf-format
is suitable for CONSTRUCT
queries:
:arrays
, which returns a list of three-element arrays. The elements of the array are the subject, predicate, and object of a constructed triple, and can be UPIs orfuture-part
s.
Additionally, the :allegrograph-2
engine allows the :triples
format. The slots in these triples will be mostly empty: you should use only subject
, predicate
, and object
. More importantly, you should be aware that parts included by a CONSTRUCT
query might not be dereferenceable, depending on the current triple store: AllegroGraph triples, by their nature, contain only UPIs, but a CONSTRUCT
query can include new URIs and literals that have not been added to the open triple store's string dictionary. The :arrays
format is wholeheartedly recommended as a replacement.
Finally, the :algebra
engine can return results from CONSTRUCT
and DESCRIBE
queries as in-memory triple stores, using the :in-memory
format.
These stores are an experimental addition, but the ability to treat the results of a query as an independent triple store can be very useful. Such a store can be queried or serialized so long as a reference to it exists.
You can use get-allowed-results-formats
and get-allowed-rdf-formats
to access these allowed values dynamically at run-time.
Variables
Programmatic results associate values with variables. Variables are parsed into symbols by the query parser.
The mapping from variables to symbols is straightforward, and best illustrated by example:
?x
→'|?x|
?X
→'?X
$foo
→'|?foo|
If you provide variables in a with-variables
argument, a leading ?
is prepended to the variable name. Your queries will run correctly if you provide them as s-expressions and do not prepend ?
, but:
variables that share a name with a self-evaluating symbol, such as
most-positive-fixnum
,t
, ornil
, will cause your query to failbindings you provide using
with-variables
will not apply, because they are always preprocessed.
All variables created by the parser are interned in the current package, as if by a call to cl:intern
. You should adhere to these rules when processing results or providing bindings using with-variables
.
SPARQL and first-class triples
AllegroGraph permits you to make assertions about triple IDs (UPIs of type triple-id
). SPARQL offers no support for this: only named graphs are supported. First-class triples are entirely outside the scope of both RDF and SPARQL.
SPARQL queries against stores using first-class triples are not supported. twinql makes only limited provisions for such queries:
- programmatic output is likely to work in most situations. Certain
FILTER
andORDER BY
operations will fail, however; typically these will result in an internal SPARQL type error, which will cause theFILTER
to fail for alltriple-id
values. - output in one of the provided results formats will, under normal circumstances, fail when a
triple-id
is encountered. The SPARQL XML writer exports a variable,sparql.results:*strict-sparql-xml-output*
, which togglestriple-id
output. If this is set tonil
,triple-id
values are printed in a<triple>
element, analogous to<literal>
. A similar variable is exported for the JSON format:*strict-sparql-json-output*
. The Turtle results format will always treat thetriple-id
as an integer.
It bears repeating that SPARQL is not intended to work with first-class triples; any queries that run successfully are little more than accidents, and named graphs are a better choice in all cases.
Datasets
Dataset loading
It is sometimes useful to be able to process the SPARQL dataset — the set of URIs provided as FROM
and FROM NAMED
parameters — when a query is executed. AllegroGraph provides a dataset load hook for your convenience.
You may bind a function to *dataset-load-function* to specify a default, or pass one as the :load-function
argument to run-sparql. Passing nil
disables the hook for that query. The argument list of the function is described in *dataset-load-function*.
Default dataset handling
When no dataset (FROM
and FROM NAMED
) are provided to a query, the actual dataset against which the query is run is not defined by the SPARQL specification. twinql provides you with two options: :default
, meaning that the default part of the dataset contains only the default graph of the store; and :all
, whereby both the default and named parts of the dataset contain every graph in the store.
You can control the default behavior by setting *sparql-default-graph-behavior*
, and set the behavior for specific queries by passing the :default-dataset-behavior
argument to run-sparql.
Verbose output
Logging output when queries are run in verbose mode is written to sparql.logging:*sparql-log-stream*
. This is *standard-output*
by default.
SPARQL and encoded values
AllegroGraph offers the ability to directly encode a range of literal values — numbers, geospatial values, and more — directly within a UPI, without the overhead of a string representation as an RDF literal. Whenever these encoded values are encountered by AllegroGraph's printing functions, and in many other situations, they are seamlessly treated as RDF literals, but with significant time and space savings.
twinql's implementations of most SPARQL and XQuery operators also handle encoded values transparently.
TODO
Variables
:default
to have SPARQL queries with no FROM
parts use only the default graph for queries. Set it to :all
to have the store run the queries on all triples in the store.
(uri db &optional type)
, to load dataset parameters before a query is executed.