Introduction
This document describes AllegroGraph's SPARQL implementation. Each of the following functions are exported from the db.agraph.sparql
package. This package is also nicknamed sparql
.
For notes on AllegroGraph's conformance to the W3C specification please see this document. For notes on the SPARQL 1.1 query engine, see the release notes.
As of version 4.4, AllegroGraph's supports all of SPARQL 1.1 query including 1 :
- aggregation
- assignment
- bindings
- expressions in the select clause
- negation
- property-paths
- sub-query
- update
- basic federated query
- expanded operators and functions
AllegroGraph also provides partial support for SPIN.
Conceptually, SPARQL has three layers:
- a parser from the textual SPARQL surface syntax to s-expressions;
- a query builder and planner that prepares the query for execution;
- and an executor that runs the plan against a store to produce results.
Currently, input and output from each of these layers is limited (for example, the query plan is not available to user code, but parsed output is). This may change in a future release.
Using SPARQL versus using Prolog
Prolog is an alternative query mechanism for AllegroGraph. The Prolog tutorial provides an introduction to using Prolog and AllegroGraph together. Prolog is further described here in the Lisp Reference (where further links are provided). This section is a brief note on the differences between our SPARQL query engine and the Prolog select
query engine. The main differences are:
Prolog is a general purpose logic programming language while SPARQL is a query language (extended with Update, expressions, etc. in SPARQL 1.1).
Prolog makes it easy to write rules and express concepts that aren't "in triples" whereas SPARQL requires either using SPIN or magic properties which are often more complex and harder to develop. (Also, the magic properties are not currently user definable).
SPARQL is fully RDF-compliant so it "knows" more about datatypes and supports the full SPARQL 1.1 set of operators and functions. In order to do computation with Prolog, you need to call out to Lisp.
SPARQL has both a depth-first and a breadth-first (the default) query executor whereas Prolog has only depth-first. Depending on the query, SPARQL can be significantly more efficient.
The Prolog and the SPARQL query planners and query engines are different and so the they may execute a BGP differently.
Choosing a Query Execution Engine
AllegroGraph currently has two query execution modes: Single Set and Chunk at a Time (CaaT). The former tends to be more time efficient and more memory hungry whereas CaaT takes more time but uses less memory.
Consider a query like the following:
select * {
?x :p1 ?o .
?o :p2 ?y .
}
In Single Set mode, the sparql-1.1
engine iterates over all triples whose predicate is :p1
(call this the p1 cursor) and creates pairs of bindings for each ?x
and ?o
It will then create a p2 cursor to iterate over all of the triples whose predicate is :p2
and merge the data from these triples with the bindings that have matching values for ?o
. In short, the engine proceeds through the query plan one step at a time and accumulates all of the results immediately.
The CaaT mode is similar but it processes large chunks of results rather than trying to work on the query in its entirety. In the example above, the CaaT engine would gather up a group of results from the p1 cursor and then make a p2 cursor to find answers. Once p2 was exhausted, the engine would go back to the p1 cursor and build up the next chunk of results. The size of the chunks is based on the amount of memory allocated to the query using the chunkProcessingMemory query option.
SPARQL Update
Previously, AllegroGraph supported a version of SPARQL Update based on a draft specification. The sparql-1.1
engine supports the new version.
Valid result formats
There are three possible outputs from a SPARQL query:
- a yes/no answer, in response to an
ASK
query; - a list of bindings, in response to a
SELECT
query; or - a new RDF graph, in response to a
CONSTRUCT
orDESCRIBE
query.
AllegroGraph provides a number of different ways to serialize these results to a stream, provided as keyword symbols to the query functions. The results-format
argument controls how ASK
and SELECT
query results are serialized; some possible formats are :sparql-xml
, which serializes the result into the SPARQL XML result format, and :sparql-json
, which uses the JSON format.
For CONSTRUCT
and DESCRIBE
, the value of the rdf-format
argument applies.
The default formats are :sparql-xml
and :rdf/xml
respectively. Providing an unrecognized format will signal an error.
You can find out which formats are allowed for a particular verb by using get-allowed-results-formats
and get-allowed-rdf-formats
.
Exported functions
Parse a SPARQL query string into an s-expression.
This function is useful for three reasons: validation and inspection of queries, manual manipulation of query expressions without text processing, and performing parsing at a more convenient time than during query execution.
You do not need an open triple-store in order to parse a query. Any parse errors will signal a sparql-parse-error
.
The optional arguments provide BASE
and PREFIX
arguments to the parser without inserting them textually into the query.
default-base
- A string to use as the BASE for the SPARQL query.
default-prefixes
- A
dictionary
orhash-table
mapping string prefixes to their namespace expansions, or a list of two element lists where each sublist contains the prefix and its expansion. For example:(("rdf" "http://www.w3.org/1999/02/22-rdf-syntax-ns#") ("rdfs" "http://www.w3.org/2000/01/rdf-schema#") ("owl" "http://www.w3.org/2002/07/owl#") ("xsd" "http://www.w3.org/2001/XMLSchema#") ("xs" "http://www.w3.org/2001/XMLSchema#") ("fn" "http://www.w3.org/2005/xpath-functions#") ("err" "http://www.w3.org/2005/xqt-errors#"))
This list uses the same format as db.agraph:standard-namespaces.
parse-sparql
returns the s-expression representation of the query string.
run-sparql
takes a SPARQL query or update command as input and returns bindings or new triples as output.
SELECT
, ASK
and UPDATE
query results will be presented according to the value provided for results-format
, whilst the RDF output of DESCRIBE
and CONSTRUCT
will be serialized according to rdf-format
. Both of these arguments take keyword values.
If the format is programmatic (that is, it is intended to return values rather than print a representation; :arrays
is an example) then any results will be returned as the first value, and nothing will be printed on output-stream
.
- The
query
can be a string, which will be parsed byparse-sparql
, an s-expression as produced byparse-sparql
, or a query plan generated by a previous call torun-sparql
. The s-expression syntax is described in greater detail in the reference. If you expect to run a query many times, you can avoid some parser overhead by parsing or planning your query once and callingrun-sparql
with the parsed representation.If
query
is a string, thendefault-base
anddefault-prefixes
are provided to parse-sparql to use when parsing the query. The former specifies aBASE
when none is provided and the later specifies a set of namespace abbreviations to use when parsing. Parser errors signaled within parse-sparql will be propagated onwards byrun-sparql
. default-base
A string to use as the BASE for the SPARQL query (only used whenquery
is a string).default-prefixes
Ahash-table
mapping string prefixes to their expansions or a list of two element lists where each sublist contains the prefix and its namespace expansion (only used whenquery
is a string; see parse-sparql for details).Results or new triples will be serialized to
output-stream
. If a programmatic format is chosen for output, the stream is irrelevant. An error will be signaled ifoutput-stream
is not a stream,t
(for output to*standard-output*
), ornil
(for output to a string) or a pathname. If a pathname is specified, then theif-exists
andif-does-not-exist
parameters will be as used to create an output stream.If
from
orfrom-named
are provided, they override the corresponding values specified in the query string itself. AsFROM
andFROM NAMED
together define a dataset, and the SPARQL Protocol specification states that a dataset specified in the protocol (in this case, the programmatic API) overrides that in the query, if eitherfrom
orfrom-named
are non-nil
then any dataset specifications in the query are ignored. You can specify that the contents of the query are to be partially overridden by providingt
as the value of one of these arguments. This is interpreted as 'use the contents of the query'.from
andfrom-named
should be lists of URIs: future-parts, UPIs, or strings.If the
limit
parameter is specified and the query string also contains a LIMIT, then the minimum of the two will be used.If the
offset
parameter is specified and the query string also contains an OFFSET, then the sum of the two will be used.The
using-named-graph-uri
andusing-graph-uri
parameters are used similarly to thefrom
andfrom-named
parameters except that they are relevant only to SPARQL UPDATE commands.The
remove-graph-uri
andinsert-graph-uri
parameters are also used only in SPARQL update. The first specifies a graph or list of graphs from which each triple to be deleted should be removed whereas the second specifies a graph or list of graphs into which each triple to be inserted should be added.default-dataset-behavior
controls how the query engine builds the dataset environment ifFROM
orFROM NAMED
are not provided. Valid options are:all
(ignore graphs; include all triples) and:default
(include only the store's default graph).default-graph-uris
allows you to specify a list of resources which, when encountered in the SPARQL dataset specification, are to be treated as the default graph of the store. Each resource can be a resource UPI, resource future-part, or a URI string. For example, specifying '("http://example.com/default") will cause a query featuringFROM <http://example.com/default> FROM <http://example.com/baz>
with-variables
should be an alist of variable names and values. The variable names can be strings (which will be interned in the package in which the query is parsed) or symbols (which should be interned in the package in which the query is to be, or was, parsed). The variable names can include or omit a leading '?'. Note that a query literal in code might be parsed at compile time. Using strings is the most reliable method for naming variables.in-line-data
provides another means to impose external data into a query. The format for the data is a list of two lists. The first list contains the list of variables to use and the second is a list of lists. Each sub-list must be the same length as the list of variables.nil
can be used to specify that a binding should be undefined.db
(*db*
by default) specifies the triple store against which queries should run.destination-db
(db
by default) specifies the triple store against which Update modifications should take place. This is primarily of use whendb
is a read-only wrapper around a writable store, such as when reasoning has been applied.If
verbosep
is non-nil
, status information is written to*sparql-log-stream*
(*standard-output*
by default).basic-authorization
is used when making SPARQL SERVICE calls. It must be a cons cell whosecar
is the user name and whosecdr
is the password. For example: `("test" . "password").load-function
must benil
or a function with signature(uri db type)
. If it is a function, it is called once for eachFROM
andFROM NAMED
parameter making up the dataset of the query. The execution of the query commences once each parameter has been processed. Thetype
argument is either:from
or:from-named
, and theuri
argument is a part (ordinarily afuture-part
) naming a URI. The default value is taken from*dataset-load-function*
. You can use this hook function to implement loading of RDF before the query is executed.If
cancel-query-on-warnings-p
is true, then any warning found during query planning will immediately cancel the query. Setting this to true can help in query debugging.host
can be used to specify the URL of a SPARQL endpoint. Ifhost
is given, then AllegroGraph will send the query to the endpoint, retrieve any results and emit them in the format specified byresults-format
orrdf-format
as appropriate.timeout
should be the number of seconds after which the query will be canceled. A value of nil means no timeout. Note that a query timeout does not interrupt the query immediately. Rather, the executing query will periodically check for timeouts and cancel itself. This allows for any query resources to be properly reclaimed.The
engine
keyword specifies which query engine to use. Allowable values keyword symbols returned by valid-query-engines. Note that depending on the version of AllegroGraph, there may be only a single engine available.`user-attributes-prefix-permission-p' must be given a true value in order to run a Sparql query including "prefix franzOption_userAttributes" otherwise an error is signalled.
to execute against the union of the contents of the named graph <http://example.com/baz>
and the store's default graph, as determined by (default-graph-upi db)
.
Before the query is executed, the variables named after symbols will be bound to the provided values.
This allows you to use variables in your query which are externally imposed, or generated by other queries. The format expected by with-variables
is the same as that used for each element of the list returned by the :alists
results-format
.
The values returned by run-sparql
are dependent on the verb used. The first value is typically disregarded in the case of results being written to output-stream
. If output-stream
is nil
, the first value will be the results collected into a string (similar to the way in which cl:format
operates).
The second value is the query verb: one of :select
, :ask
, :construct
, :describe
, or :update
.
The third value, for SELECT
queries only, is a list of variables. This list can be used as a key into the values returned by the :arrays
and lists
results formats, amongst other things.
The fourth value will be a query-information structure which contains additional information AllegroGraph gathered while executing the query.
Individual results formats are permitted to return additional values.
The following parameters are used internally by AllegroGraph and should not be used: parsed-package
, primary?
, parent-executor
, uuid
, and service-db
.
Note that the following arguments are deprecated and should no longer be used: permitted-verbs
, extendedp
and planner
.
Returns the query-engine that will be used if no other engine is specified.
If a triple-store is opened, then the QueryEngine parameter of the AllegroGraph configuration file will be used. You can use setf
to change the default for the current session.
results-format
to a query with the given verb
. if verb
is not provided, the intersection of :ask
and :select
(the two permitted values) is returned. With AllegroGraph 3.0, an additional engine
argument is available. In a similar manner to verb
, omitting this restricts the returned values to those that apply to all built-in query engines.
Returns a list of keyword symbols that are valid when applied as values of rdf-format
to a query with the given verb
. if verb
is not provided, the intersection of :construct
and :describe
(the two permitted values) is returned. With AllegroGraph 3.0, an additional engine
argument is available. In a similar manner to verb
, omitting this restricts the returned values to those that apply to all built-in query engines.
Example:
- Get formats for
CONSTRUCT
queries executed by the algebra query engine.(get-allowed-rdf-formats :construct :algebra)
engine
argument to run-sparql or db-run-sparql.
engine
argument to run-sparql or db-run-sparql.
run-sparql
is a convenient way to call the generic function db-run-sparql
. The latter specializes on the the triple-store class and query engine. In general, you should continue to use run-sparql in your code.
A generic function to dispatch query execution across different SPARQL engines and database types.
Returns values: 1) The result (depends on :results-format) 2) The SPARQL verb keyword (:select, :update, :describe, :ask) 3) A list of symbols naming variables in the result. E.g. (?name ?type ?friend ....) 4) A query-information struct, or nil.
Extension functions
SPARQL allows for query engines to associate extension functions with URIs, and call them from within queries.
You can define your own URI functions through defurifun
, or associate existing functions with a URI through associate-function-with-uri
. defurifun
does some manipulation of the arguments, so you should use it whenever possible.
uri
, which is a string or a valid part, and the provided function
, which is a symbol or a function. If cache-now-p
, and function
is a symbol, its function binding is stored instead of the symbol itself.
stream
*standard-output*
by default).
name
, and associate it with uri
as with associate-function-with-uri
. args
is not evaluated, exactly as with defun
.
Here's an example: a function that will do an HTTP HEAD request against the provided URL, returning the HTTP status code as an integer literal, or 0 if there's a problem.
(The built-in functions are quite robust, so a Lisp integer will be treated as an RDF literal with data type xsd:integer
.)
(defurifun ex-head-request !<http://example.com/fn/head> (uri)
(or
(when uri
(ignore-errors
(format t "~&Performing HTTP HEAD request on <~A>...~%"
(upi->value uri))
(second
(multiple-value-list
(net.aserve.client:do-http-request (upi->value uri)
:method :head)))))
0))
You can use this function in a query exactly as you would a built-in function.
Using this data as an example:
<http://ex.com/a> <http://ex.com/foo> "200"^^<http://www.w3.org/2001/XMLSchema#integer> .
we can run a query like so:
sparql(54): (run-sparql
"
PREFIX f: <http://example.com/fn/>
SELECT ?x {
?x <http://ex.com/foo> ?y .
FILTER ( ?y = f:head("http://franz.com\") )
}"
:results-format :count)
which produces this output:
Performing HTTP HEAD request on <http://franz.com>...
1
:select
(?x)
… we know, then, that franz.com
is returning a 200 status code.
Note that these filter functions can be called an arbitrary number of times during the execution of a query. It's not a good idea to actually perform expensive operations like HTTP requests in your queries.
SELECT bindings and ASK results
run-sparql
allows you programmatic access to results in a number of ways.
Any of the following results-formats are suitable as arguments to SELECT
or ASK
queries:
:sparql-xml
, which serializes the results as XML tooutput-stream
.:sparql-json
, which does the same in the JSON encoding.:sparql-ttl
, which does the same in the SPARQL results Turtle encoding.:table
, a simple debugging format that concisely prints the results in a table. See *sparql-table-width*.
The following results-formats are suitable as arguments to SELECT
queries:
:arrays
, which returns a list of arrays, each with one entry for each results variable.:lists
, which is the same but with lists instead of arrays.:alists
, which is the same but with association lists instead of hash tables.:count
, which returns the number of results rows.
The following results-formats are suitable as arguments to ASK
queries:
:boolean
, which returnst
ornil
for true and false respectively.
Returning triples from CONSTRUCT
and DESCRIBE
queries
Any of the following rdf-format
s are suitable as arguments to CONSTRUCT
or DESCRIBE
queries:
:ntriples
, which serializes the triples as N-Triples tooutput-stream
.:rdf/xml
, which does the same in RDF/XML.:rdf-n3
, which does the same in the Turtle subset of Notation-3.
The following rdf-format
is suitable for DESCRIBE
queries:
:triples
, which returns a list of AllegroGraph triples.
The following rdf-format
is suitable for CONSTRUCT
queries:
:arrays
, which returns a list of three-element arrays. The elements of the array are the subject, predicate, and object of a constructed triple, and can be UPIs orfuture-part
s.
Finally, SPARQL can return results from CONSTRUCT
and DESCRIBE
queries as in-memory triple stores, using the :in-memory
format. These triple-stores support the full AllegroGraph API and can therefore be queried and serialized just like a regular triple-store. When no references to them remain, they will be garbage collected just like any other Lisp data-structure.
You can use get-allowed-results-formats and get-allowed-rdf-formats to access these allowed values dynamically at run-time.
Variables
Programmatic results associate values with variables. Variables are parsed into symbols by the query parser.
The mapping from variables to symbols is straightforward, and best illustrated by example:
?x
→'|?x|
?X
→'?X
$foo
→'|?foo|
If you provide variables in a with-variables
argument, a leading ?
is prepended to the variable name. Your queries will run correctly if you provide them as s-expressions and do not prepend ?
, but:
variables that share a name with a self-evaluating symbol, such as
most-positive-fixnum
,t
, ornil
, will cause your query to failbindings you provide using
with-variables
will not apply, because they are always preprocessed.
All variables created by the parser are interned in the current package, as if by a call to cl:intern
. You should adhere to these rules when processing results or providing bindings using with-variables
.
SPARQL and first-class triples
AllegroGraph permits you to make assertions about triple IDs (UPIs of type triple-id
). SPARQL offers no support for this: only named graphs are supported. First-class triples are entirely outside the scope of both RDF and SPARQL.
SPARQL queries against stores using first-class triples are not supported. AllegroGraph's SPARQL engine makes only limited provisions for such queries:
programmatic output is likely to work in most situations. Certain
FILTER
andORDER BY
operations will fail, however; typically these will result in an internal SPARQL type error, which will cause theFILTER
to fail for alltriple-id
values.output in one of the provided results formats will, under normal circumstances, fail when a
triple-id
is encountered. The SPARQL XML writer exports a variable,sparql.results:*strict-sparql-xml-output*
, which togglestriple-id
output. If this is set tonil
,triple-id
values are printed in a<triple>
element, analogous to<literal>
. A similar variable is exported for the JSON format:*strict-sparql-json-output*
. The Turtle results format will always treat thetriple-id
as an integer.
It bears repeating that SPARQL is not intended to work with first-class triples; any queries that run successfully are little more than accidents, and named graphs are a better choice in all cases.
Datasets
Dataset loading
It is sometimes useful to be able to process the SPARQL dataset — the set of URIs provided as FROM
and FROM NAMED
parameters — when a query is executed. AllegroGraph provides a dataset load hook for your convenience.
You may bind a function to *dataset-load-function* to specify a default, or pass one as the :load-function
argument to run-sparql. Passing nil
disables the hook for that query. The argument list of the function is described in *dataset-load-function*.
Default dataset handling
When no dataset (FROM
and FROM NAMED
) are provided to a query, the actual dataset against which the query is run is not defined by the SPARQL specification. AllegroGraph provides you with two options: :default
, meaning that the default part of the dataset contains only the default graph of the store; and :all
, whereby both the default and named parts of the dataset contain every graph in the store.
You can control the default behavior by setting *default-dataset-behavior*
(formerly *sparql-default-graph-behavior*
), and set the behavior for specific queries by passing the :default-dataset-behavior
argument to run-sparql.
Verbose output
Logging output when queries are run in verbose mode is written to db.agraph.query.sparql:*sparql-log-stream*
. This is *standard-output*
by default.
SPARQL and encoded values
AllegroGraph offers the ability to directly encode a range of literal values — numbers, geospatial values, and more — directly within a UPI, without the overhead of a string representation as an RDF literal. Whenever these encoded values are encountered by AllegroGraph's printing functions, and in many other situations, they are seamlessly treated as RDF literals, but with significant time and space savings.
AllegroGraph's implementation of most SPARQL and XQuery operators also handles encoded values transparently.
SPARQL Query Options
AllegroGraph provides control over a number of internal settings by extending the SPARQL PREFIX notation. Options are changed by prepending a PREFIX of the form:
PREFIX franzOption_optionName: <franz:optionValue>
where optionName
and optionValue
are replaced by the name and value of the option being changed.
Options can also be specified in the configuration file, which is described in the Server Configuration document. See here in that document for how options are specified.
The available options are subject to change as some of them are experimental. The following is a list of the currently available options:
The username and password to use for basic authorization
This is used when making a SPARQL SERVICE call.
Example Prefix:
PREFIX franzOption_authorizationBasic: <franz:user:password>
The default value is no authorization setting
If true, then warnings found during query parsing, planning and execution will cause a query to fail immediately rather than continuing.
Warnings include things like unknown variables in a ORDER BY clause or FILTER expression, constants in the query that cannot be in the store and so on.The possible values are:
- yes - turn the option on
- no - turn the option off
Example Prefix:
PREFIX franzOption_cancelQueryOnWarnings: <franz:no>
The default value is no
Controls whether to use Chunk at a Time (CaaT) processing.
It can be:
possibly
- use CaaT for unordered queries with small limits and use the single-set approach otherwise. Note that this works best when solutions are found in the first several chunks processed which means that the query can finish quickly. If a large portion of the search space must be scanned, then the single-set approach can be faster.yes
- always use CaaT when possible (some query clauses like EXISTS filters and SPIN magic properties do not yet support CaaT).no
- always use the single set approach and never use CaaT.
The default value is possibly
which means that AllegroGraph is optimizing for speed rather than space. The no
option is focused on speed at the possible cost of higher memory use whereas the yes
option is more constrained in memory use at the cost of slower queries.
Example Prefix:
PREFIX franzOption_chunkProcessingAllowed: <franz:possibly>
Specifies the maximum amount of memory used by a single chunk.
Controls the size (in bytes) of the chunks used by the CaaT executor. This option takes precedence over the deprecated chunkProcessingSize option.
The minimium allowed value is 200M.
See the chunkProcessingAllowed option for additional query control.
Example Prefix:
PREFIX franzOption_chunkProcessingMemory: <franz:4294967296>
The default value is 4,294,967,296
(Deprecated) Specifies the chunk processing size in rows
Deprecated in favor of the chunkProcessingMemory option.
Control the size (in rows of answers) of the chunks used by the CaaT executor. The higher the number, the larger the chunks processed will be which is both more efficient and more memory intensive. A typical value is 400000 or 1000000.
See the chunkProcessingAllowed option for additional control.
Example Prefix:
PREFIX franzOption_chunkProcessingSize: <franz:400000>
The default value is 400,000
The strategy used to reorder triple patterns in a query.
This option controls how the triple patterns in a single Basic Graph Pattern (BGP) are reordered.
The available strategies will depend on the query engine being used but will always include identity
which tells the query planner to not reorder the triple patterns of the BGPs. Another common choice is statistical
which uses the statistics of the triple-store to try to reorder clauses most efficiently.
Note that other query planning algebraic manipulations may cause BGPs in your query to be merged and that reordering does not extend to larger query structures (like UNION or OPTIONAL).
Example Prefix:
PREFIX franzOption_clauseReorderer: <franz:statistical>
The clause reorderer defaults to statistical
Specify the default attributes to assign to any triple created by a SPARQL update command.
The attributes must be specified in URL encoded JSON format. So the example below is using the URL encoded form of which is the URL encoded form of {"rank": "High" }.
Example Prefix:
PREFIX franzOption_defaultAttributes: <franz:%7B%22rank%22%3A%20%22High%22%20%7D>
No attributes
Controls the dataset used for SPARQL queries with no FROM or FROM NAMED clauses.
The possible values are:
- all - All triples will be in the FROM portion of the dataset. Triples whose graph is not the default-graph will be in a graph named by their graphs. Note that this means that triples whose graph is not the default-graph will appear in the dataset twice: once in a named graph (named by their graph slot) and again in the default graph of the dataset.
- default - Only triples whose graph is the default graph of the triple-store will be in the default graph of the dataset. No triples will be in the named graph portion of the dataset.
- rdf - Only triples whose graph is the default graph of the triple-store will be in the default graph of the dataset. Triples whose graph is not the default-graph will be in the named graph portion of the dataset.
Example Prefix:
PREFIX franzOption_defaultDatasetBehavior: <franz:all>
The default value is all
Specifies the number of solutions to keep in memory before writing temporary files.
This should be a number like 500000 or 100m. The larger the value, the more memory AllegroGraph will use during query processing. Smaller values can be more memory efficient but also can perform more slowly because the will be more I/O activity.
Note that this setting controls the memory used to hold completed solutions not the memory used to hold intermediate solutions. See the chunkProcessingMemory option for more details.
Example Prefix:
PREFIX franzOption_diskChunkRowCount: <franz:500000>
The default value is 500,000
The size at which AllegroGraph will log full scan warnings.
AllegroGraph will log a warning if it needs to perform a full scan and the triple-store contains more than fullScanWarningSize
triples.
A full triple scan occurs when a query contains a free pattern (like ) and AllegroGraph is not able to find enough constraints on the pattern's variables. The warning does not mean that something is necessarily wrong but is an indication that AllegroGraph is being forced to perform significant filesystem I/O if the database does not fit in memory.
Setting the value larger than the triple-store's size will prevent the warning from appearing in the log.
Example Prefix:
PREFIX franzOption_fullScanWarningSize: <franz:1000000>
The default value is 1,000,000
Controls whether or not query execution details are logged.
If logging is on, the query engine prints additional information to the AllegroGraph log file as it plans and executes a query. If logging is onFailure
, then query log information is gathered but not emitted unless there is a query failure.
Logging on failure has a small cost especially when the amount of data logged is high (e.g., when chunkProcessingAllowed is turned on). We recommend setting the value to onFailure
during development and then turning it to no
for production. The possible values are:
- onFailure - Log only when there is a query failure
- no - Do not log
- yes - Log the entire query
Example Prefix:
PREFIX franzOption_logQuery: <franz:no>
The default value is no
Specifies an upper limit on the number of solutions that are allowed during query processing before a warning is logged.
Queries run best when the solution space is kept small. This warning is in an indication that a query is generating many intermediate results. This is a normal part of query processing but can indicate that a query should be optimized
Example Prefix:
PREFIX franzOption_maximumSolutionsSize: <franz:100k>
The default is to warn when the intermediate solution space is larger than 100,000,000 solutions
Specify how much system memory must be free for a query to continue.
If the query process is using more than this setting's percentage of total physical memory, then the query will be canceled. The default value is 90%.
Example Prefix:
PREFIX franzOption_memoryExhaustionWarningPercentage: <franz:90.0>
The default value is 90.0
Specifies the memory limit per query.
If a query tries to use more than this, it will be canceled.
Example Prefix:
PREFIX franzOption_memoryLimit: <franz:8G>
The default value will be 85% of the physical memory on the server
The timezone in which xsd:dateTimes and xsd:times are serialized.
For example, if presentationTimeZone is "-02:00", then "2013-10-01T15:21:23+03:00" is serialized as "2013-10-01T10:21:23-02:00". Zoneless xsd:datetimes and xsd:times are always presented without a timezone. This option has no effect on what is stored in the database. The allowed values are strings representing the timezone. The format of these strings is the same as in xsd:dateTimes. The special value "none" means that no conversion will take place.
Example Prefix:
PREFIX franzOption_presentationTimeZone: <franz:-5:00>
The default is set to none
.
Specifies the query engine to use when executing queries.
For example, to use the SPARQL 1.1 query engine, set this to :sparql-1.1.
Example Prefix:
PREFIX franzOption_queryEngine: <franz:sparql-1.1>
The default value is sparql-1.1
Specifies a query timeout value in seconds.
Note that the timeout is not an interrupt; AllegroGraph checks for query timeout relatively infrequently so that a query can run for many seconds longer than the specified timeout. This is especially true for operations involving reasoning or non-triple-pattern based queries like free-text indexing or SNA path planning operators.
Setting the timeout to zero is the same as having no timeout.
Example Prefix:
PREFIX franzOption_queryTimeout: <franz:30>
The default is to have no query timeout. I.e., queries will run until complete.
Controls whether or not AllegroGraph interleaves query execution and triple-pattern selection.
If no
, then AllegroGraph will perform all reordering during query planning. If yes
, then AllegroGraph will defer reordering until query execution time. In many cases, the additional information available at execution time can enhance query performance.
Note that interleaving reordering is not always a win because performing all ordering at query planning time allows for the query engine to introduce joins which can sometimes enhance query performance.
See the clauseReorderer
option for additional informationThe possible values are:
- yes - turn the option on
- no - turn the option off
Example Prefix:
PREFIX franzOption_reorderDuringExecution: <franz:no>
The default value is no
The number of seconds to wait before a remote query times out.
This will also have an effect on SPARQL Federated query (i.e., using the SERVICE clause).
Example Prefix:
PREFIX franzOption_serviceTimeout: <franz:120>
The default value is 120
Specifies the maximum number of results to return from a given SOLR query.
Example Prefix:
PREFIX franzOption_solrQueryLimit: <franz:100>
The default value is 100
Specifies the maximum amount of temporary file space that may be used by a query.
If a query tries to use more file space than this, it will be canceled.
Queries write intermediate results to the filesystem when they will not fit in memory. With a huge query it is possible for such temporary files to fill the filesystem. In order to prevent this, the temporaryFilesystemSpaceLimit query option may be set.
The minimum allowable value for this setting is 2-gigabytes.
Example Prefix:
PREFIX franzOption_temporaryFilesystemSpaceLimit: <franz:1021175808>
The default value is to use the minimum of 8-gigabytes and one quarter of the available filesystem space at the time the query begins.
If yes, then range queries will not scan typed literal triples.
This means that only encoded triples will be considered. The only reason to set this option to no
is if your triple-store contains typed literals that are not encoded (i.e., that are in the string-table) which could happen if you disabled AllegroGraph's datatype mapping.The possible values are:
- yes - turn the option on
- no - turn the option off
Example Prefix:
PREFIX franzOption_trustEncodedDatatypesForRangeQueries: <franz:yes>
The default value is yes
If yes, then predicate type mappings will be used for range queries.
This means that any triples whose encoded data-type does not match their predicate mapping will be ignored. This could happen only if a predicate mapping was added or changed after triples had been added.The possible values are:
- yes - turn the option on
- no - turn the option off
Example Prefix:
PREFIX franzOption_trustPredicateTypeMappingsForRangeQueries: <franz:yes>
The default value is yes
Use subject and object UPI type-codes to improve constraint inference
If yes, then the query engine will gather information about the subjects and objects associated with particular predicates. This can be used in constraint analysis and query transformations. As an example, suppose we have a query like:
?one ex:date ?date1 .
?two ex:date ?date2 .
filter( ?date1 > ?date2 )
If there is no predicate type-mapping, then the query engine can not make any assumptions about the range comparison. If there is a predicate type-mapping and trustPredicateTypeMappingsForRangeQueries
is true, then the engine can know that the filter can be treated as a date comparison. If usePredicateConstrainedUpiTypeInformation
is yes, then the query engine will check the triple-store to determine which UPI type-codes the subjects and objects associated with ex:date
can take on. If the objects of ex:date
only have, e.g., UPI type-code +rdf-date+, then the filter will be handled more efficiently.
The type-code information is cached but if the store is changing rapidly, then the cache will often be invalid and this computation will slightly add to the cost of queries.The possible values are:
- yes - turn the option on
- no - turn the option off
Example Prefix:
PREFIX franzOption_usePredicateConstrainedUpiTypeInformation: <franz:yes>
The default value is yes
Use typed-literal XSD types to improve constraint inference.
Similar to usePredicateConstrainedUpiTypeInformation but involves a scan of all typed-literals (which can be expensive). This is currently not cached!The possible values are:
- yes - turn the option on
- no - turn the option off
Example Prefix:
PREFIX franzOption_usePredicateConstrainedXsdTypeInformation: <franz:no>
The default value is no
Specify the user attributes to use while evaluating the query.
The attributes must be specified in URL encoded JSON format. So the example below is using the URL encoded form of {"access-level": "medium", "department": "hr"}.
Example Prefix:
PREFIX franzOption_userAttributes: <franz:%7B%22access-level%22%3A%20%22medium%22%2C%20%22department%22%3A%20%22hr%22%7D>
empty
Query Warnings
SPARQL is relatively lax when it comes to accepting and evaluating queries. For example, this query is valid SPARQL but it is probably not what was intended ?variableIsNotBound
is not actually bound by the triple patterns in the rest of the query:
select ?variableIsNotBound {
?s ?p ?o .
}
A more typical example could be caused by a typo. For example, changing the case of a variable:
select ?subjectOne ?class {
?subjectone rdf:type ?Class .
}
Neither ?subjectOne
nor ?class
will be bound since the query uses ?subjectone
and ?Class
. When AllegroGraph plans and executes a query, it will detect problems like the above and generate query warnings. The cancelQueryOnWarnings query option can be used to stop execution immediately when any warnings are found.
The following is a list of the currently defined warnings:
This warning indicates that one or more basic graph patterns (BGPs) in the query create a cross product. I.e., there are patterns in the query that have disjoint sets of variables which will cause the SPARQL engine to find all possible matches between the sets which can lead to very large solution sets. For example:
SELECT * {
?a ?b ?c .
?d ?e ?f .
}
Since the first triple-pattern and the second triple-pattern share no variables, the above query will find all possible combinations of each pair of triples in the underlying repository.
This warning is signaled when a query specifies a DATASET and one or more graphs in its FROM or FROM NAMED portions are not in the repository.
For example, if 'http://example#22` is not in the repository, then these queries will signal the warning:
SELECT *
FROM <http://example#1>
FROM <http://example#22> {
?s ?p ?o .
}
SELECT *
FROM NAMED <http://example#22> {
?s ?p ?o .
}
run-sparql
or in the HTTP request), and these two values do not match. When this happens, the smaller of the two values is used.
run-sparql
or in the HTTP request). In this case, the sum of the two offsets will be used.
This warning is signaled when a query uses a DATASET and the graph of a GRAPH clause is not in its FROM NAMED portion. For example
SELECT *
FROM NAMED <http://graph1>
FROM NAMED <http://graph2> {
GRAPH <http://example#22> {
?s ?p ?o
}
}
FILTER ( false )
can never succeed.
FILTER( ?x IN ( ) )
.
This warning indicates that a FILTER expression puts invalid constraints on the language of a variable. For example:
SELECT * { ?s ?p ?o . FILTER( LANG(?o) = 'es' && LANG(?o) = 'en') }
cannot succeed because the LANG(?o) cannot be both 'es' and 'en'.
... ?s ?p ?o . FILTER( ?o = 'tinker, tailer, soldier, spy' )
This warning is signaled when a FILTER expression cannot succeed because the repository does not contain any of the values that would need to be matched. For example, assume that the repository does not contain any triples with either 'one two three' or 'hard fruitcake'. In this case, these queries will signal the warning:
SELECT * {
?s ?p ?o .
FILTER( ?o IN ( 'one two three', 'hard fruitcake' )
}
SELECT * {
VALUES ?o { 'one two three' 'hard fruitcake' }
?s ?p ?o .
}
FILTER( ?o > 3 && ?o < 0 )
.
This warning indicates that a FILTER expression cannot succeed because it is comparing a string with a non-string. E.g.,
SELECT * { ?s ?p ?o . FILTER( STR(?o) = 45 ) }
This warning is signaled at query time when a CONSTRUCT or UPDATE template generates invalid RDF triples. For example, this query will emit no triples because ?s
takes on only literal bindings and these are not valid in the subject position:
CONSTRUCT {
?s a example:Car .
} WHERE {
VALUES ?s { 'mazda 3' 'ford pinto' 'bmw 300i' }
}
This warning indicates that the query algebra contains one or more cross products. I.e., there are portions of the algebra that have disjoint sets of variables which will cause the SPARQL engine to find all possible matches between the sets which can lead to very large solution sets. For example:
SELECT * {
?a a ?type .
VALUES ?FOO { <ex://a> <ex://b> }
}
Since the triple-pattern and the VALUES clause share no variables, the above query will find all possible combinations from the two sets.
This warning is signaled if a GRAPH clause specifies a literal for the GRAPH. For example:
SELECT * {
GRAPH ?g {
?s ?p ?o .
}
BIND( 'literal' as ?g )
}
?s ?p ?o . FILTER( ?s = 34 )
must fail because ?s
is bound to the subject of a triple and subjects cannot be literals.
This warning is signaled when a predicate has a type mapping which does not match the datatype of the values used in a range FILTER. For example, if <http://example#age>
has a mapping to xsd:byte
, then this query will signal the warning:
SELECT * {
?s <http://example#age> ?age .
FILTER( ?age > '2001-10-15'^^xsd:date )
}
because the FILTER cannot succeed.
This warning is signaled when a query uses constants that are not in the repository. For example, if 'tinker, tailer, soldier, spy' is not in the repository, then this query cannot succeed:
SELECT * { ?s ?p 'tinker, tailer, soldier, spy' . }
Similarly for this query:
SELECT * { ?s ?p ?o . FILTER( ?o = 'tinker, tailer, soldier, spy' ) }
This warning is signaled when it can be determined that a variable used in an expression is not bound anywhere in the query. Examples include:
SELECT * { ?s ?p ?o . } ORDER BY ?missing
SELECT * { ?s ?p ?o . FILTER( ?missing > 5 )
CONSTRUCT { ?missing ?p ?o } WHERE { ?s ?p ?o }
and so on.
SPARQL functions
AllegroGraph has support for the standard set of functions specified in the W3C SPARQL reference. It also supports several XPath functions and a number of custom functions designed to help with using AllegroGraph's extensions such as Geospatial and Social Network Analysis.
Supported SPARQL 1.1 functions
AllegroGraph supports all of the SPARQL 1.1 functions except for timezone
and tz
.
XPath Constructor Functions
AllegroGraph supports the standard SPARQL casting operations. For details, refer to the SPARQL reference for more details.
- <http://www.w3.org/2001/XMLSchema#boolean> ( a )
- <http://www.w3.org/2001/XMLSchema#byte> ( a )
- <http://www.w3.org/2001/XMLSchema#date> ( a )
- <http://www.w3.org/2001/XMLSchema#dateTime> ( a )
- <http://www.w3.org/2001/XMLSchema#decimal> ( a )
- <http://www.w3.org/2001/XMLSchema#double> ( a )
- <http://www.w3.org/2001/XMLSchema#float> ( a )
- <http://www.w3.org/2001/XMLSchema#int> ( a )
- <http://www.w3.org/2001/XMLSchema#integer> ( a )
- <http://www.w3.org/2001/XMLSchema#long> ( a )
- <http://www.w3.org/2001/XMLSchema#short> ( a )
- <http://www.w3.org/2001/XMLSchema#string> ( a )
- <http://www.w3.org/2001/XMLSchema#time> ( a )
- <http://www.w3.org/2001/XMLSchema#unsignedByte> ( a )
- <http://www.w3.org/2001/XMLSchema#unsignedInt> ( a )
- <http://www.w3.org/2001/XMLSchema#unsignedLong> ( a )
- <http://www.w3.org/2001/XMLSchema#unsignedShort> ( a )
Functions on Dates and Times
- day ( dateTime )
- hours ( dateTime )
- minutes ( dateTime )
- month ( dateTime )
- now ( )
- seconds ( dateTime )
- year ( dateTime )
Hash Functions
Functions on Numerics
SPARQL Operators
These functions are described in detail in the Operator Mappings section of the W3C SPARQL reference.
- != ( a, b )
- * ( a, b )
- + ( a )
- + ( a, b )
- - ( a, b )
- - ( a )
- / ( a, b )
- / ( a )
- < ( a, b )
- <= ( a, b )
- = ( a, b )
- > ( a, b )
- >= ( a, b )
Functions on RDF terms
- bnode ( identifier )
- bnode ( )
- datatype ( literal )
- iri ( iri )
- isBlank ( rdf-term )
- isIRI ( rdf-term )
- isLiteral ( rdf-term )
- isNumeric ( rdf-term )
- isURI ( rdf-term )
- lang ( literal )
- str ( literal )
- strdt ( literal, datatype )
- strlang ( literal, language )
- struuid ( )
- uri ( iri )
- uuid ( )
Functional Forms
- RDFterm-equal ( a, b )
- bound ( uri )
- coalesce ( ... )
- exists ( pattern )
- if ( expression, then )
- if ( expression, then, else )
- in ( expression, ... )
- logical-and ( a, b )
- logical-or ( a, b )
- not in ( expression, ... )
- not-exists ( pattern )
- sameTerm ( a, b )
Functions on Strings
- concat ( ... )
- contains ( string-literal1, string-literal2 )
- encode_for_uri ( string-literal )
- langMatches ( language-tag, language-range )
- lcase ( string-literal )
- regex ( target, regex )
- regex ( target, regex, flags )
- replace ( literal, pattern, replacement, flags )
- replace ( literal, pattern, replacement )
- strafter ( string-literal1, string-literal2 )
- strbefore ( string-literal1, string-literal2 )
- strends ( string-literal1, string-literal2 )
- strlen ( string-literal )
- strstarts ( string-literal1, string-literal2 )
- substr ( string-literal, start, length )
- substr ( string-literal, start )
- ucase ( string-literal )
Supported XPath functions
AllegroGraph also supports several XPath functions:
Functions on Dates and Times
- <http://www.w3.org/2005/xpath-functions#current-date> ( )
- <http://www.w3.org/2005/xpath-functions#current-dateTime> ( )
- <http://www.w3.org/2005/xpath-functions#current-time> ( )
XPath Mathematical Functions
For additional details on the XPath mathematical functions, see https://www.w3.org/TR/xpath-functions-3/.
- <http://www.w3.org/2005/xpath-functions/math#acos> ( x )
- <http://www.w3.org/2005/xpath-functions/math#asin> ( x )
- <http://www.w3.org/2005/xpath-functions/math#atan> ( x )
- <http://www.w3.org/2005/xpath-functions/math#atan2> ( x, y )
- <http://www.w3.org/2005/xpath-functions/math#cos> ( x )
- <http://www.w3.org/2005/xpath-functions/math#exp> ( x )
- <http://www.w3.org/2005/xpath-functions/math#exp10> ( x )
- <http://www.w3.org/2005/xpath-functions/math#log> ( x )
- <http://www.w3.org/2005/xpath-functions/math#log10> ( x )
- <http://www.w3.org/2005/xpath-functions/math#pi> ( )
- <http://www.w3.org/2005/xpath-functions/math#pow> ( x, y )
- <http://www.w3.org/2005/xpath-functions/math#sin> ( x )
- <http://www.w3.org/2005/xpath-functions/math#sqrt> ( x )
- <http://www.w3.org/2005/xpath-functions/math#tan> ( x )
Miscellaneous functions
- <http://www.w3.org/2005/xpath-functions#compare> ( a, b )
- <http://www.w3.org/2005/xpath-functions#error> ( ... )
- <http://www.w3.org/2005/xpath-functions#false> ( )
- <http://www.w3.org/2005/xpath-functions#not> ( a )
- <http://www.w3.org/2005/xpath-functions#true> ( )
Functions on Numerics
- <http://www.w3.org/2005/xpath-functions#abs> ( a )
- <http://www.w3.org/2005/xpath-functions#ceil> ( a )
- <http://www.w3.org/2005/xpath-functions#floor> ( a )
- <http://www.w3.org/2005/xpath-functions#round> ( a )
- <http://www.w3.org/2005/xpath-functions#round-half-to-even> ( a )
- <http://www.w3.org/2005/xpath-functions#round-half-to-even> ( a, b )
Functions on Strings
- <http://www.w3.org/2005/xpath-functions#concat> ( a, b, ... )
- <http://www.w3.org/2005/xpath-functions#contains> ( string-literal1, string-literal2 )
- <http://www.w3.org/2005/xpath-functions#ends-with> ( string-literal1, string-literal2 )
- <http://www.w3.org/2005/xpath-functions#lower-case> ( string-literal )
- <http://www.w3.org/2005/xpath-functions#normalize-space> ( )
- <http://www.w3.org/2005/xpath-functions#normalize-space> ( string-literal )
- <http://www.w3.org/2005/xpath-functions#starts-with> ( string-literal1, string-literal2 )
- <http://www.w3.org/2005/xpath-functions#string-length> ( )
- <http://www.w3.org/2005/xpath-functions#string-length> ( string-literal )
- <http://www.w3.org/2005/xpath-functions#substring> ( string-literal, start, length )
- <http://www.w3.org/2005/xpath-functions#substring> ( string-literal, start )
- <http://www.w3.org/2005/xpath-functions#translate> ( string-literal, from, to )
- <http://www.w3.org/2005/xpath-functions#upper-case> ( string-literal )
AllegroGraph extensions
AllegroGraph supports several functions above and beyond those defined by the SPARQL standard. The additional functions are named by URI (or prefixed name) and can appear anywhere in a SPARQL expression (e.g., in a BIND, a FILTER, an ORDER BY, etc.).
nD Geospatial
These functions are useful in working with AllegroGraph nD-geospatial facilities:
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#haversineLatLonLatLon> ( lat1, lon1, lat2, lon2, [ units ] )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#haversineLatLonLatLon> ( lat1, lon1, lat2, lon2, [ units ] )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#haversineLatLonLoc> ( lat, lon, loc, [ units ] )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#haversineLatLonLoc> ( lat, lon, loc, [ units ] )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#haversineLocLoc> ( loc1, loc2, [ units ] )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#haversineLocLoc> ( loc1, loc2, [ units ] )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#ordinateValue> ( value, name )
- <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#ordinatesToValue> ( value, ... )
2D Geospatial
These functions are useful in working with the older AllegroGraph 2D Geospatial facilities:
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesianDistance> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesianDistanceSquared> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesianX> ( point )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesianY> ( point )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/haversineKilometers> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/haversineMiles> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/haversineRadians> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/latitude> ( point )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/longitude> ( point )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/toPointLonLat> ( predicate, longitude, latitude )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/toPointXY> ( predicate, x, y )
2D Geospatial (deprecated)
These deprecated 2D Geospatial functions are still supported but you should change your queries to use the above functions instead.
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesian-distance> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesian-distance-squared> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesian-x> ( p )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/cartesian-y> ( p )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/haversine-km> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/haversine-miles> ( p1, p2 )
- <http://franz.com/ns/allegrograph/3.0/geospatial/fn/haversine-r> ( p1, p2 )
Social Networking (SNA) Related functions
These functions can help when using AllegroGraph's SNA extensions.
- <http://franz.com/ns/allegrograph/4.11/sna/egoGroup> ( generator, node, depth )
- <http://franz.com/ns/allegrograph/4.11/sna/neighborCache> ( generator, node, depth )
Miscellaneous other functions
These functions help connect SPARQL to various other AllegroGraph features.
- <http://franz.com/ns/allegrograph/fn#encodedIdId> ( resource )
Variables
Controls how SPARQL behaves when no dataset is specified.
If nil, then the default value of the defaultDatasetBehavior query property will be used. Otherwise, this value will be used.
See the defaultDatasetBehavior query option for details
(uri db &optional type)
, to load dataset parameters before a query is executed.
Function index
Footnotes
- Note that SPARQL 1.1 is only supported by the sparql-1.1 query engine. ↩