Table of Contents

Introduction

Freetext

Geospatial

Magic properties

Helper Functions

SNA

Generators

Neighbors

Groups and Centrality Measures

Neighbor Caches

Paths

Cliques

Temporal

relations between points

relation between intervals

relations between points and intervals

relations between points and datetimes

relations between intervals and datetimes

Implementation Notes

Property places that must be bound

Introduction

A Magic Property is a predicate in a SPARQL query that produces bindings using something other than simple subgraph matching. These extensions provide a much richer query environment at the cost of non-portability. AllegroGraph has long supported a Magic Property to enable freetext queries and to interface to Solr and MongoDB. For example, when a query contains a pattern like

?subject fti:match 'baseball' . 

AllegroGraph does not look at the triples in the triple-store to find matching patterns; rather, it uses the freetext index to find the triples that have objects with baseball in their text.

AllegroGraph includes both enhanced Magic Properties for freetext queries and new properties to enable SPARQL queries to access Geospatial, Temporal and Social Network Analysis.

Note that Magic Properties can use patterns with multiple inputs and outputs. SPARQL's list notation provides a syntactic sugar to make this quite readable. Here is an example that looks for text matching willows in the freetext index named titles and then binds ?book to the matches it finds:

select * {  
  ?book fti:match ('willows' 'titles' ) .  
} 

This parenthetical notation uses SPARQL's (and Turtle's) syntactic sugar for the longer (and harder to read!) but equivalent query:

SELECT * {  
  ?book fti:match _:b0 .  
  _:b0 rdf:first "willows" .  
  _:b0 rdf:rest _:b1 .  
  _:b1 rdf:first "titles" .  
  _:b1 rdf:rest rdf:nil  
} 

Freetext

AllegroGraph has long offered support for simple freetext queries. It now offers enhancements that allow both for the selection of the index to use and let you easily retrieve the object of any matching triples. Both fti:match and fti:matchExpression provide the same four pattern forms:

  1. ?subject fti:match 'text to query' .
  2. (?subject ?object) fti:match 'text to query' .
  3. ?subject fti:match ('text to query' 'index name') .
  4. (?subject ?object) fti:match ('text to query' 'index name') .

The second and fourth forms bind the second variable on the subject side to the object of any matching triples.

Note that both the query text and the index name must be constants. You can, however, specify a particular subject or object to have the Magic Property act as a filter. For example,

<ex:wind_in_the_williows> fti:match ('toad' 'characters') . 

would succeed if and only if the freetext index named characters indexed a triple with subject * * whose object contained toad.

Geospatial

See also the Geospatial Tutorial for general geospatial examples which use the older interface described here in the Lisp Reference.

Magic properties

AllegroGraph's existing geospatial extensions to SPARQL are now augmented with support via Magic Properties. Geospatial Magic Properties are defined for determining points in a bounding box and points within a circle. Each comes in several variants to accommodate different usages.

The definitions that follow use these prefix definitions:

PREFIX geofn: <http://franz.com/ns/allegrograph/3.0/geospatial/fn/>  
PREFIX geo: <http://franz.com/ns/allegrograph/3.0/geospatial/> 

To use a geospatial Magic Property, you must ensure that the query engine can determine the geospatial subtype based on the predicate. This can be done by creating a predicate type mapping between the predicate and the subtype. The mechanics of this varies with the client. For example, we could create a predicate mapping between <http://example.com/pointLatLong> and the spherical geospatial subtype with a strip width of 1 kilometer in the Python client using code like:

geoSubtype = conn.createURI("http://franz.com/ns/allegrograph/3.0/geospatial/spherical/km/-180.0/180.0/-90.0/90.0/1")  
latlon = conn.createURI("http://example.com/pointLatLong")  
conn.registerDatatypeMapping(datatype=geoSubtype, predicate=latLon, nativeType="int") 

In the following examples, we assume that the predicate ex:location is registered and mapped to a geospatial subtype. Given this, we can ask for subjects in the bounding box defined by the two points ?lowerLeftPoint and ?upperRightPoint using:

?subject geo:inBoundingBox (ex:location ?lowerLeftPoint ?upperRightPoint) . 

The points are assumed to be AllegroGraph geospatial UPIs. (A UPI is a Unique Part Identifier, described here.) They can come from other variables in the query or be constructed using BIND. For example, 1

BIND( geofn:toCartesianPoint( ex:location 4.3, 12.6 ) AS ?lowerLeft )  
BIND( geofn:toCartesianPoint( ex:location 10, 20.1 ) AS ?upperRight )  
(?subject ?where) geo:inBoundingBox (ex:location ?lowerLeft ?upperRight) 

This second example shows that you can bind the object of the triples found by using a list of variables on the left hand side of the geospatial Magic Property.

If you have X,Y coordinates rather than points, you can either use the BIND form shown above or use the X,Y version of the Magic Property. For example, this query returns the same results as the one above:

(?subject ?where) geo:inBoundingBoxXY (ex:location 4.3 12.6 10 20.1) 

The other Magic Properties are:

# triples in a circle using cartesian coordinates  
?subject geo:inCircle (ex:location centerPoint radius)  
?subject geo:inCircleXY (ex:location x y radius)  
 
# triples in a circle using spherical coordinates (with different units)  
?subject geo:inCircleMiles (:p centerPoint radius)  
?subject geo:inCircleMilesXY (:p centerPoint radius)  
?subject geo:inCircleKilometers (:p centerPoint radius)  
?subject geo:inCircleKilometersXY (:p centerPoint radius)  
?subject geo:inCircleRadians (:p centerPoint radius)     
?subject geo:inCircleRadiansXY (:p centerPoint radius)    

The predicates above all search in the object of the triple. You can search in the graph instead by appending "Graph" to any of the predicates above. For example: 2 .

 ?subject geo:inCircleGraph (ex:location centerPoint radius) 

Note that all of the properties require that their arguments be bound before they can be evaluated. AllegroGraph will attempt to re-order patterns in BGPs such that this holds.

Helper Functions

The following functions are defined to help evaluate geospatial queries:

The functions toPointXY and toPointLonLat require the predicate argument so that AllegroGraph can determine which geospatial subtype should be used to construct the point. 3

SNA

AllegroGraph now provides Magic Properties that work with its Social Networking Analysis (SNA) Library. Recall that the SNA functions use abstract generators to specify which nodes in the graph are neighbors. You can define generators using the existing client APIs or via SPARQL (see below). To use a generator with the Magic Properties, you must name it with a URI.

In the following, the namespace prefix sna is short for http://franz.com/ns/allegrograph/4.11/sna/.

Generators

A triple-store is a graph of triples where the subjects and objects are vertexes in the graph and the triples define labeled edges between these nodes. Often, however, it makes more sense for a given problem to define an abstract graph on top of the triple-store by specifying which nodes are neighbors of other nodes. In this case, the vertexes are still subjects and objects but the edges are specified via a function that computes the neighbors of a node. We call a function like this a generator. For example, a triple-store of publications will have triples like:

:b1 foaf:name "Sam Smith" .  
:b2 foaf:name "Betty Bintur" .  
:a1 rdfs:label "Book about Cats" .  
:a1 dc:creator :b1 ;  
  dc:creator :b2 . 

We might be interested in the graph of co-authors. In this graph, two authors are linked if they both created the same article. In SPARQL, this would look like:

SELECT ?coCreator {  
  ?article dc:creator ?input .  
  ?article dc:creator ?coCreator .  
  FILTER( ?input != ?coCreator )  
} 

(the FILTER makes sure that a person isn't a co-author with themselves).

Defining Generators

You can define a generator using one of the client APIs or by including triples of the correct form in the triple-store itself. When a Magic Property specifies a generator named <generator>, AllegroGraph will look for an existing definition. If it is not found, then AllegroGraph will look in the triple-store for a triple like

graph sna:sna { _:b sna:hasName <generator> } 

If found, then the triples associated with that blank node will be used to construct the generator on the fly. As an example, the SPARQL generator above could be added to the store using this SPARQL update statement:

prefix ex: <http://www.franz.com/sna#>  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
#  
# First, delete any existing definition (just in case!)  
#  
delete { graph sna:sna { ?id ?p ?o }}  
where {  
  graph sna:sna {  
    ?id a sna:Generator ;  
    sna:hasName ex:coCreators ;  
    ?p ?o  .  
  }  
} ;  
 
#  
# then add the definition  
#  
insert data {  
  graph sna:sna {  
   [ a sna:Generator ;  
      sna:hasName ex:coCreators ;  
      sna:hasSPARQL '''  
prefix dc: <http://purl.org/dc/elements/1.1/>  
select distinct ?output {  
  ?article dc:creator ?input .  
  ?article dc:creator ?output .  
  FILTER( ?input != ?coCreator )  
}''' ]  
  }  
} 

In this definition, the generator has sna:name ex:coCreators and is defined by the SPARQL query sna:hasSPARQL using the text of the query directly.

The following defining forms are allowed:

sna:hasSPARQL query
specify a SPARQL query (as a literal) to use to find neighbors. The query must project a single variable binding. To specify which graph vertex is being examined, the query can either use a variable named ?input or use a different variable and specify its name using sna:hasInput.
sna:objectsOf predicate(s)
Starting from a graph vertex as a subject, define its neighbors as the objects of the triples with the given predicate(s). Two forms are possible:
sna:objectsOf example:onePredicate . 

or

sna:objectsOf (example:predicate1 example:predicate2 ...) 
sna:subjectsOf predicate(s)
like sna:objectsOf only start from an object and define its neighbors as the subjects of triples with the given predicate(s). For example:
[ a :Generator ;  
    :hasName ex:knowsOrHeardOfS ;  
    :subjectsOf (<http://www.franz.com/sna#knows> <http://www.franz.com/sna#heardOf>) ] . 
sna:undirected predicate(s)
This combines sna:subjectsOf and sna:objectsOf in that it will define neighbors as the union of the subjects and objects of the given predicate(s).
sna:hasSelect
Define neighbors using a Prolog Select query. The query must return a single variable binding and should use the (?? node) syntax to specify the starting graph vertex. For example,
(select ?person2  
  (q ?article !dc:creator (?? node))  
  (q ?article !dc:creator ?person2)  
  (lispp (not (upi= node ?person2)))) 

the query will be read into the current environment so using namespace abbreviations is not recommended.

Neighbors

Use sna:nodalNeighbors to iterate over the neighbors of a node (as determined by a generator). For example:

?neighbor sna:nodalNeighbors (sna:coCreators ?start) . 

would bind ?neighbor to each vertex that is adjacent to ?start.

Groups and Centrality Measures

Many of the SNA functions are defined in terms of nodes and groups. You can specify a group in a SPARQL query using either the BIND form or the Magic Property form. Both forms require a generator, a starting node and a depth:

# BIND form{footnote "Not working yet because of bound variable analysis"}  
# Find ex:Erdoes's ego group out to depth 2 using the coCreator generator.  
BIND( sna:egoGroup( ex:coCreators ex:Erdoes 2 ) as ?group )  
 
# Magic Property form  
?group sna:egoGroup (ex:coCreators ex:Erdoes 2) 

These groups act like blank nodes and have no meaning outside of a given query. 4 . Within a query, however, we can use other Magic Properties to examine the group. 5 For example, we can get actor degree centrality for each actor in an ego group by building a group with the sna:egoGroup function and then using sna:actorDegreeCentrality.

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?centrality {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?group sna:egoGroup (sna:coCreators ?s 3) .  
  (?actor ?centrality) sna:actorDegreeCentrality (sna:coCreators ?group) .  
} 

This will compute the actor degree centrality for each member in Erdoes's ego group. If we only wanted the centrality for a single actor, we could have used something like this:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>   
prefix foaf: <http://xmlns.com/foaf/0.1/>   
select ?actor ?centrality {   
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .   
  ?group sna:egoGroup (sna:coCreators ?s 3) .   
  ?actor foaf:name 'Paul Thinkle'^^xsd:string .   
  (?actor ?centrality) sna:actorDegreeCentrality (sna:coCreators ?group) .  
} 

I.e., we first find the blank node corresponding to Erdoes; then we find the group that surrounds that blank node; then we find the blank node that corresponds to Paul Thinkle; and finally, we find centrality. The binding on ?actor means that we only find a single centrality measure.

Use sna:members to iterate over the members of a group:

?actor sna:members ?group . 

and the group graph density with

?density sna:groupDensity (<generator> <group>) 

The group centrality measures are similar: given a group, we'd get the group degree centrality with

?centrality sna:groupDegreeCentrality (<generator> <group>) 

The following centrality measures are defined:

You can find the size of a group by using sna:size as in

?group sna:size ?size 

Neighbor Caches

Because computing some measures can be quite expensive, the SNA library provides a caching mechanism to save information about nodal neighbors. The SPARQL SNA extensions call these neighbor caches. As with ego groups, you can create a cache using either the bind form or the Magic Property form:

BIND( sna:neighborCache( <generator> <starting points> <depth> ) as ?cache )  
 
?cache sna:neighborCache( <generator> <starting points> <depth> ) . 

starting points can be a node or a group.

Once we have the cache, we can use it wherever we'd use a generator (or a group). For example, here is a query that computes closeness centrality for each actor using the generator:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?c {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?group sna:egoGroup (sna:coCreators ?s 1) .  
  (?actor ?c) sna:actorClosenessCentrality (sna:coCreators ?group) .  
} 

and here is the same query using the neighbor cache:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?c {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?cache sna:neighborCache (sna:coCreators ?s 1) .  
  (?actor ?c) sna:actorClosenessCentrality (?cache ?cache) .  
} 

Since the centrality measure needs both a generator and a group, we use the cache twice. This second form can be significantly faster. Note that neighbor caches are themselves cached between queries.

Paths

A path is an ordered sequence of nodes starting at node1 and ending at node2 such that each node is the neighbor (as defined by a generator) of its predecessor. AllegroGraph provides three primitive path finding operations: breadth-first, depth-first and bidirectional. Each of these has a corresponding Magic Property. For example, this pattern will succeed if a path exists between ex:llama and ex:caribou:

ex:llama sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) . 

It will use the depth first search strategy.

Examining the paths between two nodes is more complicated because SPARQL can only bind variables to single values (literals or resources) and a path consists of multiple ordered values. To accommodate this, we introduce path identifiers and node indexes. The first represents a single path and the second is a typed literal that represents the index of the node in its path.

As an example, suppose we start with this graph

  /--- b ---\  
a            c  
  \--- d ---/ 

We can ask for all of the paths between <a> and <c> using

(<a> ?vertex ?linkNumber ?path) sna:depthFirstSearch (ex:generator <c>) . 

AllegroGraph will find two paths: (a b c) and (a d c). It will represent these as:

?vertex    ?linkNumber ?path  
=================================  
  a        0           0  
  b        1           0  
  c        2           0  
  a        0           1  
  d        1           1  
  c        2           1 

That is, it will return bindings for the three variables such that ?path will have one value for (a b c) and another value for the (a d c). Within each path, ?linkNumber will index the vertexes and ?vertex will actually be bound to each node along the way.

If the ?path is left off, then AllegroGraph will return only the first path that it finds. If you do not need to know the order of the vertexes in the path, then ?linkNumber can also be left off. So for example, these two queries will find a single path and return some information about it:

(ex:llama ?vertex) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .  
 
(ex:llama ?vertex ?order) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) . 

and this will return all paths (as described in the table above):

(ex:llama ?vertex ?order ?path) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) . 

Sometimes, it is useful to be able to examine each path in turn. The Magic Properties sna:depthFirstSearchPaths, sna:breadthFirstSearchPaths, and sna:bidirectionalSearchPaths iterate over paths between two nodes. For example, this pattern will bind ?path to an different identifier for each path between ex:llama and ex:caribou:

(ex:llama ?path) sna:depthFirstSearchPaths (ex:animalNeighborGenerator ex:caribou) . 

The path identifiers can then be used with other Magic Properties. For example, sna:members iterates over the vertexes of a path:

# first form  
?vertex sna:members ?path .  
 
# second form  
(?vertex ?order) sna:members ?path . 

and sna:size returns the length of a path:

?path sna:size ?length . 

Note that path identifiers have meaning only within the query execution and should not be projected.

Using the same graph (a,b,c,d) graph from above, this query

(<a> ?path) sna:depthFirstSearchPaths (ex:generator <c>) .  
(?vertex ?order) sna:vertexOf ?path . 

would return something very much like the sna:depthFirstSearch query did above:

?vertex       ?order            ?path  
===================================  
      a           0            _:g0  
      b           1            _:g0  
      c           2            _:g0  
      a           0            _:g1  
      d           1            _:g1  
      c           2            _:g1 

But this second form also allows us to compute things like the average path length:

select (avg(?length) as ?avgLength {  
  (<a> ?path) sna:depthFirstSearchPaths (ex:generator <c>) .  
  ?path sna:size ?length .  
}  

Cliques

We can tell if a group is a clique with

?isClique sna:isClique (<generator> <group>) . 

We can find the cliques with

?clique san:cliquesOf (<generator> <actor> ) .  
?clique san:cliquesOf (<generator> <actor> <minimum-size> ) . 

?clique will be bound to a group identifier for each clique found. As mentioned above, this identifier makes sense only within the query and it should not be projected. It can, of course, be used by other SNA function and properties.

?clique san:cliquesOf (<generator> <actor> <minimum-size> ) .  
?member sna:member ?clique . 

Temporal

There is a tutorial using an older interface here. The older interface is also described here.

AllegroGraph supports efficient storage and retrieval of temporal data including:

In the following, the namespace prefix t is short for http://franz.com/ns/allegrograph/3.0/temporal/. AllegroGraph also requires that time points are defined using the t:time predicate and that intervals are defined using either t:starttime and t:endtime or t:startpoint and t:endpoint. Starting in version 4.11 of AllegroGraph, the t:time, t:starttime, and t:endtime predicates are automatically mapped to xsd:dateTimes (see predicate type mapping for more details).

Once data has been encoded, you can query for:

The temporal reasoning tutorial describes all of these capabilities in detail and also functions as a general reference guide. Below, we will quickly outline the various SPARQL Magic Properties. To illustrate them, we'll use a triple-store with intervals defined for days and months of 2013 like:

:day1Start t:time "2013-01-01T00:00:00"^^xsd:dateTime .  
:day1End t:time "2013-01-01T12:59:59"^^xsd:dateTime .  
:day1 t:startpoint :day1Start ;  
   t:endpoint :day1End ;  
   rdfs:label "January 1st" .  
:day2Start t:time "2013-01-02T00:00:00"^^xsd:dateTime .  
:day2End t:time "2013-01-02T12:59:59"^^xsd:dateTime .  
:day2 t:startpoint :day2Start ;  
   t:endpoint :day2End ;  
   rdfs:label "January 2nd" .  
...  
:month1 t:startpoint :day1Start ;  
  t:endpoint :day31End ;  
  rdfs:label "January" .  
... 

We will also include an interesting date in the store:

:earthDay t:startpoint :day110Start .  

relations between points

We can ask for all the points before the month of January ends using

select * {  
   ?month rdfs:label 'January' .  
   ?month t:endpoint ?monthEnds .  
   ?point t:pointBefore ?monthEnds .  
} 

This will return the start and end of each day in January (though it will not return the end of January 31st because that point is simultaneous with the end of the month and not before it).

relation between intervals

For example, we can find the number of days 6 in January by querying:

select (count(?day) as ?days) {  
   ?month rdfs:label 'January' .  
   ?day t:intervalDuring ?month .  
} group by ?month 

relations between points and intervals

For example, we can ask for the month during which Earth Day falls using:

select ?month ?label {  
  :earthday t:startpoint ?start .  
  ?start t:pointDuringInterval ?month .  
  ?month rdfs:label ?label .  
} 

relations between points and datetimes

relations between intervals and datetimes

Implementation Notes

Property places that must be bound

A Magic Property has a collection of input places and output places (where a place can be a variable binding or a constant). For example, the fti:match Magic Property has four possible places:

Each Magic Property puts constraints on the contents of its places. Some places can be optional (e.g., the index name place of the fti:match property). Some places must be bound before the property is evaluated (for example, the fti:match property cannot execute a free-text query if the search expression is not bound).

When AllegroGraph plans a query, it will try to re-organize the patterns so that any places that must be bound are, in fact, bound when the query executor gets to them. If AllegroGraph is unable to find such an arrangement, then it will signal an error. We are still enhancing the algorithm AllegroGraph uses for query ordering so there are some queries that are logcally possible but cannot currently be exected.

Footnotes

  1. This BIND form won't work yet ???
  2. The Graph forms are not defined yet
  3. Note that many of the functions were defined previously with slightly different names. For example, cartesian-distance-squared instead of cartesianDistanceSquared. The new names are added for consistency; the old names will eventually be deprecated and removed.
  4. SNA Groups, caches and path identifiers will all serialize as if they were blank nodes with the same blank node identifier
  5. Note that group members are cached between queries when possible to make the SNA functions operate more efficiently.
  6. Actually this will be two less than the number of days in January because January 1st starts the month and January 31st finishes it. I.e. they are not during the month.