SPARQL Magic Properties | AllegroGraph 7.0.0

Introduction

See the Defining Magic Properties Tutorial for more information on how to define your own magic properites using Lisp.

A Magic Property is a predicate in a SPARQL query that produces bindings using something other than simple subgraph matching. These extensions provide a much richer query environment at the cost of non-portability. AllegroGraph has long supported a Magic Property to enable freetext queries and to interface to Solr and MongoDB. For example, when a query contains a pattern like

?subject fti:match 'baseball' .

AllegroGraph does not look at the triples in the triple-store to find matching patterns; rather, it uses the freetext index to find the triples that have objects with baseball in their text.

AllegroGraph includes both enhanced Magic Properties for freetext queries and new properties to enable SPARQL queries to access Geospatial, Temporal and Social Network Analysis.

Note that Magic Properties can use patterns with multiple inputs and outputs. SPARQL's list notation provides a syntactic sugar to make this quite readable. Here is an example that looks for text matching willows in the freetext index named titles and then binds ?book to the matches it finds:

select * {  
  ?book fti:match ('willows' 'titles' ) .  
}

This parenthetical notation uses SPARQL's (and Turtle's) syntactic sugar for the longer (and harder to read!) but equivalent query:

SELECT * {  
  ?book fti:match _:b0 .  
  _:b0 rdf:first "willows" .  
  _:b0 rdf:rest _:b1 .  
  _:b1 rdf:first "titles" .  
  _:b1 rdf:rest rdf:nil  
}

Freetext

AllegroGraph supports freetext queries with enhancements that allow the selection of the index to use and the easy retrieval of the object of any matching triples. Both fti:match and fti:matchExpression provide the same four pattern forms:

?subject fti:match 'text to query' .
(?subject ?object) fti:match 'text to query' .
?subject fti:match ('text to query' 'index name') .
(?subject ?object) fti:match ('text to query' 'index name') .

The second and fourth forms bind the second variable on the subject side to the object of any matching triples.

Note that both the query text and the index name must be constants. You can, however, specify a particular subject or object to have the Magic Property act as a filter. For example,

<ex:wind_in_the_williows> fti:match ('toad' 'characters') .

would succeed if and only if the freetext index named characters indexed a triple with subject <ex:wind_in_the_williows> whose object contained toad.

n-Dimensional (nD) Geospatial

The nD geospatial facility is described generally in nD Geospatial Overview. A tutorial is in the nD Geospatial Usage Guide. The magic properties for making nD geospatial queries are listed below in the Magic properties list.

The Lisp interface to nD geospatial is described here in the Lisp Reference.

The nD definitions that follow use these prefix definitions:

PREFIX geofn: <http://franz.com/ns/allegrograph/3.0/geospatial/fn/>  
PREFIX geo:   <http://franz.com/ns/allegrograph/3.0/geospatial/>  
PREFIX nd:    <http://franz.com/ns/allegrograph/5.0/geo/nd#>   
PREFIX ndfn:  <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#>   
PREFIX :      <http://franz.com/ns/keyword#>

The several SPARQL Magic Predicates in both systems (nD and 2D) find triples in the store based on their encoded data. To use a geospatial Magic Property in either system you must ensure that the query engine can determine the geospatial subtype based on the triple predicate. This can be done by creating a predicate type mapping between the predicate and the subtype. The mechanics of this vary with the client. For example, in the Python client we could create a predicate mapping between <http://example.com/pointLatLong> and the spherical geospatial subtype with a strip width of 1 kilometer using code like:

geoSubtype = conn.createURI("http://franz.com/ns/allegrograph/3.0/geospatial/spherical/km/-180.0/180.0/-90.0/90.0/1")  
latlon = conn.createURI("http://example.com/pointLatLong")  
conn.registerDatatypeMapping(datatype=geoSubtype, predicate=latlon, nativeType="int")

These links document establishing a predicate mapping in HTTP, Lisp, Java, and Python.

2D Geospatial

2D SPARQL Magic Properties are no longer supported.

SNA

AllegroGraph now provides Magic Properties that work with its Social Networking Analysis (SNA) Library. Recall that the SNA functions use abstract generators to specify which nodes in the graph are neighbors. You can define generators using the existing client APIs or via SPARQL (see below). To use a generator with the Magic Properties, you must name it with a URI.

In the following, the namespace prefix sna is short for http://franz.com/ns/allegrograph/4.11/sna/.

Generators

A triple-store is a graph of triples where the subjects and objects are vertexes in the graph and the triples define labeled edges between these nodes. Often, however, it makes more sense for a given problem to define an abstract graph on top of the triple-store by specifying which nodes are neighbors of other nodes. In this case, the vertexes are still subjects and objects but the edges are specified via a function that computes the neighbors of a node. We call a function like this a generator. For example, a triple-store of publications will have triples like:

:b1 foaf:name "Sam Smith" .  
:b2 foaf:name "Betty Bintur" .  
:a1 rdfs:label "Book about Cats" .  
:a1 dc:creator :b1 ;  
    dc:creator :b2 .

We might be interested in the graph of co-authors. In this graph, two authors are linked if they both created the same article. In SPARQL, this would look like:

SELECT ?coCreator {  
  ?article dc:creator ?input .  
  ?article dc:creator ?coCreator .  
  FILTER( ?input != ?coCreator )  
}

(the FILTER makes sure that a person isn't a co-author with themselves).

Defining Generators

You can define a generator using one of the client APIs or by including triples of the correct form in the triple-store itself. When a Magic Property specifies a generator named <generator>, AllegroGraph will look for an existing definition. If it is not found, then AllegroGraph will look for the triple

?node sna:hasName <generator> sna:sna .

I.e., the triple with predicate <http://franz.com/ns/allegrograph/4.11/sna/hasName>, object matching the generator you looking for and graph <http://franz.com/ns/allegrograph/4.11/sna/sna>. If this triple is found, then the triples associated with that subject will be used to construct the generator on the fly. As an example, the SPARQL generator above could be added to the store using this SPARQL update statement:

prefix ex: <http://www.franz.com/sna#>  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
#  
# First, delete any existing definition (just in case!)  
#  
delete { graph sna:sna { ?id ?p ?o }}  
where {  
  graph sna:sna {  
    ?id a sna:Generator ;  
        sna:hasName ex:coCreators ;  
        ?p ?o  .  
  }  
} ;  
 
#  
# then add the definition  
#  
insert data {  
  graph sna:sna {  
   [ a sna:Generator ;  
       sna:hasName ex:coCreators ;  
       sna:hasSPARQL '''  
prefix dc: <http://purl.org/dc/elements/1.1/>  
select distinct ?output {  
  ?article dc:creator ?input .  
  ?article dc:creator ?output .  
  FILTER( ?input != ?output )  
}''' ]  
  }  
}

In this definition, the generator has sna:hasName ex:coCreators and is defined by the SPARQL query sna:hasSPARQL using the text of the query directly. Note that if there is more than one generator defined with the same name, AllegroGraph will signal an error.

The following defining forms are allowed:

sna:hasSPARQL query

specify a SPARQL query (as a literal) to use to find neighbors. The query must project a single variable binding. To specify which graph vertex is being examined, the query can either use a variable named ?input or use a different variable and specify its name using sna:hasInput.

sna:objectsOf predicate(s)

Starting from a graph vertex as a subject, define its neighbors as the objects of the triples with the given predicate(s). Two forms are possible:

sna:objectsOf example:onePredicate .

sna:objectsOf (example:predicate1 example:predicate2 ...)

sna:subjectsOf predicate(s)

like sna:objectsOf only start from an object and define its neighbors as the subjects of triples with the given predicate(s). For example:

[ a sna:Generator ;  
    sna:hasName ex:knowsOrHeardOfS ;  
    sna:subjectsOf (<http://www.franz.com/sna#knows> <http://www.franz.com/sna#heardOf>) ] .

sna:undirected predicate(s)

This combines sna:subjectsOf and sna:objectsOf in that it will define neighbors as the union of the subjects and objects of the given predicate(s).

sna:hasSelect

Define neighbors using a Prolog Select query. The query must return a single variable binding and should use the (?? node) syntax to specify the starting graph vertex. For example,

(select ?person2  
  (q ?article !dc:creator (?? node))  
  (q ?article !dc:creator ?person2)  
  (lispp (not (upi= node ?person2))))

The query will be read into the current environment so using namespace abbreviations is not recommended.

Neighbors

Use sna:nodalNeighbors to iterate over the neighbors of a node (as determined by a generator). For example:

?neighbor sna:nodalNeighbors (sna:coCreators ?start) .

would bind ?neighbor to each vertex that is adjacent to ?start.

Groups and Centrality Measures

Many of the SNA functions are defined in terms of nodes and groups. You can specify a group in a SPARQL query using either the BIND form or the Magic Property form. Both forms require a generator, a starting node and a depth. These next two queries are equivalent.

# Find the size of Erdoes's social network out to a depth of 2  
# using the BIND form.  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?size {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  BIND( sna:egoGroup( sna:coCreators, ?s, 2 ) as ?group )   
  ?group sna:size ?size .  
}  
 
# Find the size of Erdoes's social network out to a depth of 2  
# using the magic property form.  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?size {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  Bgroup sna:egoGroup (ex:coCreators ?s 2)  
  ?group sna:size ?size .  
}

These groups act like blank nodes and have no meaning outside of a given query. ¹ . Within a query, however, we can use other Magic Properties to examine the group. ² For example, we can get actor degree centrality for each actor in an ego group by building a group with the sna:egoGroup function and then using sna:actorDegreeCentrality.

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?centrality {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?group sna:egoGroup (sna:coCreators ?s 3) .  
  (?actor ?centrality) sna:actorDegreeCentrality (sna:coCreators ?group) .  
}

This will compute the actor degree centrality for each member in Erdoes's ego group. If we only wanted the centrality for a single actor, we could have used something like this:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>   
prefix foaf: <http://xmlns.com/foaf/0.1/>   
select ?actor ?centrality {   
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .   
  ?group sna:egoGroup (sna:coCreators ?s 3) .   
  ?actor foaf:name 'Paul Thinkle'^^xsd:string .   
  (?actor ?centrality) sna:actorDegreeCentrality (sna:coCreators ?group) .  
}

I.e., we first find the blank node corresponding to Erdoes; then we find the group that surrounds that blank node; then we find the blank node that corresponds to Paul Thinkle; and finally, we find centrality. The binding on ?actor means that we only find a single centrality measure.

Use sna:members to iterate over the members of a group:

?actor sna:members ?group .

and the group graph density with

?density sna:groupDensity (<generator> <group>)

The group centrality measures are similar: given a group, we'd get the group degree centrality with

?centrality sna:groupDegreeCentrality (<generator> <group>)

The following centrality measures are defined:

sna:actorDegreeCentrality
sna:actorClosenessCentrality
sna:actorBetweennessCentrality
sna:groupDegreeCentrality
sna:groupClosenessCentrality
sna:groupBetweennessCentrality

You can find the size of a group by using sna:size as in

?group sna:size ?size

Neighbor Caches

Because computing some measures can be quite expensive, the SNA library provides a caching mechanism to save information about nodal neighbors. The SPARQL SNA extensions call these neighbor caches. As with ego groups, you can create a cache using either the bind form or the Magic Property form:

BIND( sna:neighborCache( <generator> <starting points> <depth> ) as ?cache )  
 
?cache sna:neighborCache( <generator> <starting points> <depth> ) .

starting points can be a node or a group.

Once we have the cache, we can use it wherever we'd use a generator (or a group). For example, here is a query that computes closeness centrality for each actor using the generator:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?c {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?group sna:egoGroup (sna:coCreators ?s 1) .  
  (?actor ?c) sna:actorClosenessCentrality (sna:coCreators ?group) .  
}

and here is the same query using the neighbor cache:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?c {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?cache sna:neighborCache (sna:coCreators ?s 1) .  
  (?actor ?c) sna:actorClosenessCentrality (?cache ?cache) .  
}

Since the centrality measure needs both a generator and a group, we use the cache twice. This second form can be significantly faster. Note that neighbor caches are themselves cached between queries.

Paths

A path is an ordered sequence of nodes starting at node1 and ending at node2 such that each node is the neighbor (as defined by a generator) of its predecessor. AllegroGraph provides three primitive path finding operations: breadth-first, depth-first and bidirectional. Each of these has a corresponding Magic Property. For example, this pattern will succeed if a path exists between ex:llama and ex:caribou:

ex:llama sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .

It will use the depth first search strategy.

Examining the paths between two nodes is more complicated because SPARQL can only bind variables to single values (literals or resources) and a path consists of multiple ordered values. To accommodate this, we introduce path identifiers and node indexes. The first represents a single path and the second is a typed literal that represents the index of the node in its path.

As an example, suppose we start with this graph

  /--- b ---\  
a            c  
  \--- d ---/

We can ask for all of the paths between <a> and <c> using

(<a> ?vertex ?linkNumber ?path) sna:depthFirstSearch (ex:generator <c>) .

AllegroGraph will find two paths: (a b c) and (a d c). It will represent these as:

?vertex    ?linkNumber ?path  
=================================  
  a        0           0  
  b        1           0  
  c        2           0  
  a        0           1  
  d        1           1  
  c        2           1

That is, it will return bindings for the three variables such that ?path will have one value for (a b c) and another value for the (a d c). Within each path, ?linkNumber will index the vertexes and ?vertex will actually be bound to each node along the way.

If the ?path is left off, then AllegroGraph will return only the first path that it finds. If you do not need to know the order of the vertexes in the path, then ?linkNumber can also be left off. So for example, these two queries will find a single path and return some information about it:

(ex:llama ?vertex) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .  
 
(ex:llama ?vertex ?order) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .

and this will return all paths (as described in the table above):

(ex:llama ?vertex ?order ?path) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .

Sometimes, it is useful to be able to examine each path in turn. The Magic Properties sna:depthFirstSearchPaths, sna:breadthFirstSearchPaths, and sna:bidirectionalSearchPaths iterate over paths between two nodes. For example, this pattern will bind ?path to an different identifier for each path between ex:llama and ex:caribou:

(ex:llama ?path) sna:depthFirstSearchPaths (ex:animalNeighborGenerator ex:caribou) .

The path identifiers can then be used with other Magic Properties. For example, sna:members iterates over the vertexes of a path:

# first form  
?vertex sna:members ?path .  
 
# second form  
(?vertex ?order) sna:members ?path .

and sna:size returns the length of a path:

?path sna:size ?length .

Note that path identifiers have meaning only within the query execution and should not be projected.

Using the same graph (a,b,c,d) graph from above, this query

(<a> ?path) sna:depthFirstSearchPaths (ex:generator <c>) .  
(?vertex ?order) sna:vertexOf ?path .

would return something very much like the sna:depthFirstSearch query did above:

?vertex       ?order            ?path  
===================================  
      a           0            _:g0  
      b           1            _:g0  
      c           2            _:g0  
      a           0            _:g1  
      d           1            _:g1  
      c           2            _:g1

But this second form also allows us to compute things like the average path length:

select (avg(?length) as ?avgLength {  
  (<a> ?path) sna:depthFirstSearchPaths (ex:generator <c>) .  
  ?path sna:size ?length .  
}

Cliques

We can tell if a group is a clique with

?isClique sna:isClique (<generator> <group>) .

We can find the cliques with

?clique sna:cliquesOf (<generator> <actor> ) .  
?clique sna:cliquesOf (<generator> <actor> <minimum-size> ) .

?clique will be bound to a group identifier for each clique found. As mentioned above, this identifier makes sense only within the query and it should not be projected. It can, of course, be used by other SNA function and properties.

?clique sna:cliquesOf (<generator> <actor> <minimum-size> ) .  
?member sna:member ?clique .

Temporal

There is a tutorial using an older interface here. The older interface is also described here.

AllegroGraph supports efficient storage and retrieval of temporal data including:

dateTimes in ISO 8601 format: "2008-02-01T00:00:00-08:00"
time points: ex:point1, ex:h-hour, ex:when-the-meeting-began, etc
time intervals: ex:delay-interval (e.g., from point ex:point1 to ex:h-hour)

In the following, the namespace prefix t is short for http://franz.com/ns/allegrograph/3.0/temporal/. AllegroGraph also requires that time points are defined using the t:time predicate and that intervals are defined using either t:starttime and t:endtime or t:startpoint and t:endpoint. Starting in version 4.11 of AllegroGraph, the t:time, t:starttime, and t:endtime predicates are automatically mapped to xsd:dateTimes (see predicate type mapping for more details).

Once data has been encoded, you can query for:

relations between two points
relations between two intervals
relations between points and dateTimes
relations between intervals and dateTimes
relations between points and intervals

The temporal reasoning tutorial describes all of these capabilities in detail and also functions as a general reference guide. Below, we will quickly outline the various SPARQL Magic Properties. To illustrate them, we'll use a triple-store with intervals defined for days and months of 2013 like:

:day1Start t:time "2013-01-01T00:00:00"^^xsd:dateTime .  
:day1End t:time "2013-01-01T12:59:59"^^xsd:dateTime .  
:day1 t:startpoint :day1Start ;  
   t:endpoint :day1End ;  
   rdfs:label "January 1st" .  
:day2Start t:time "2013-01-02T00:00:00"^^xsd:dateTime .  
:day2End t:time "2013-01-02T12:59:59"^^xsd:dateTime .  
:day2 t:startpoint :day2Start ;  
   t:endpoint :day2End ;  
   rdfs:label "January 2nd" .  
...  
:month1 t:startpoint :day1Start ;  
  t:endpoint :day31End ;  
  rdfs:label "January" .  
...

We will also include an interesting date in the store:

:earthDay t:startpoint :day110Start .

relations between points

t:pointBefore
t:pointAfter
t:pointSimultaneous

We can ask for all the points before the month of January ends using

select * {  
   ?month rdfs:label 'January' .  
   ?month t:endpoint ?monthEnds .  
   ?point t:pointBefore ?monthEnds .  
}

This will return the start and end of each day in January (though it will not return the end of January 31st because that point is simultaneous with the end of the month and not before it).

relation between intervals

t:intervalBefore
t:intervalAfter
t:intervalMeets
t:intervalMetBy
t:intervalOverlaps
t:intervalOverlappedBy
t:intervalStarts
t:intervalStartedBy
t:intervalDuring
t:intervalContains
t:intervalFinishes
t:intervalFinishedBy
t:intervalCotemporal

For example, we can find the number of days ³ in January by querying:

select (count(?day) as ?days) {  
   ?month rdfs:label 'January' .  
   ?day t:intervalDuring ?month .  
} group by ?month

relations between points and intervals

t:pointBeforeInterval
t:pointStartsInterval
t:pointDuringInterval
t:pointEndsInterval
t:pointAfterInterval

For example, we can ask for the month during which Earth Day falls using:

select ?month ?label {  
  :earthday t:startpoint ?start .  
  ?start t:pointDuringInterval ?month .  
  ?month rdfs:label ?label .  
}

relations between points and datetimes

t:pointBeforeDatetime
t:pointAfterDatetime

relations between intervals and datetimes

t:intervalBeforeDatetime
t:intervalAfterDatetime

Magic properties list

nD Geospatial

Attributes

Reification

<http://franz.com/ns/allegrograph/4.0/tripleId>

Social Network Analysis

Temporal

Text Indexing

Validation

2D Geospatial

Footnotes

SNA Groups, caches and path identifiers will all serialize as if they were blank nodes with the same blank node identifier ↩
Note that group members are cached between queries when possible to make the SNA functions operate more efficiently. ↩
Actually this will be two less than the number of days in January because January 1st starts the month and January 31st finishes it. I.e. they are not during the month. ↩