SPARQL SNA Magic Properties | AllegroGraph 8.4.1

Introduction to SNA magic properties

The list of supported magic properties, including all SNA magic properties, is in the SPARQL Magic Properties document. This document describes the various types of SNA magic properties.

AllegroGraph provides Magic Properties that work with its Social Networking Analysis (SNA) Library. Recall that the SNA functions use abstract generators to specify which nodes in the graph are neighbors. You can define generators using the existing client APIs or via SPARQL (see below). To use a generator with the Magic Properties, you must name it with a URI.

In the following, the namespace prefix sna: is short for http://franz.com/ns/allegrograph/4.11/sna/.

Generators

A triple-store is a graph of triples where the subjects and objects are vertexes in the graph and the triples define labeled edges between these nodes. Often, however, it makes more sense for a given problem to define an abstract graph on top of the triple-store by specifying which nodes are neighbors of other nodes. In this case, the vertexes are still subjects and objects but the edges are specified via a function that computes the neighbors of a node. We call a function like this a generator. For example, a triple-store of publications will have triples like:

:b1 foaf:name "Sam Smith" .  
:b2 foaf:name "Betty Bintur" .  
:a1 rdfs:label "Book about Cats" .  
:a1 dc:creator :b1 ;  
    dc:creator :b2 .

We might be interested in the list of co-authors. Two authors are linked if they both created the same article. In SPARQL, this would look like:

SELECT ?coCreator {  
  ?article dc:creator ?input .  
  ?article dc:creator ?coCreator .  
  FILTER( ?input != ?coCreator )  
}

(the FILTER makes sure that a person isn't a co-author with themselves).

Defining Generators

You can define a generator using one of the client APIs or by including triples of the correct form in the triple-store itself. When a Magic Property specifies a generator named <generator>, AllegroGraph will look for an existing definition. If it is not found, then AllegroGraph will look for the triple

graph sna:sna { ?node sna:hasName <generator> }

I.e., the triple with predicate <http://franz.com/ns/allegrograph/4.11/sna/hasName>, object matching the generator you looking for and graph <http://franz.com/ns/allegrograph/4.11/sna/sna>. If this triple is found, then the triples associated with that subject will be used to construct the generator on the fly. As an example, the SPARQL generator above could be added to the store using this SPARQL update statement:

prefix ex: <http://www.franz.com/sna#>  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
#  
# First, delete any existing definition (just in case!)  
#  
delete { graph sna:sna { ?id ?p ?o }}  
where {  
  graph sna:sna {  
    ?id a sna:Generator ;  
        sna:hasName ex:coCreators ;  
        ?p ?o  .  
  }  
} ;  
 
#  
# then add the definition  
#  
insert data {  
  graph sna:sna {  
   [ a sna:Generator ;  
       sna:hasName ex:coCreators ;  
       sna:hasSPARQL '''  
prefix dc: <http://purl.org/dc/elements/1.1/>  
select distinct ?output {  
  ?article dc:creator ?input .  
  ?article dc:creator ?output .  
  FILTER( ?input != ?output )  
}''' ]  
  }  
}

In this definition, the generator has sna:hasName ex:coCreators and is defined by the SPARQL query sna:hasSPARQL using the text of the query directly. Note that if there is more than one generator defined with the same name, AllegroGraph will signal an error.

The following defining forms are allowed:

sna:hasSPARQL query

specify a SPARQL query (as a literal) to use to find neighbors. The query must project a single variable binding. To specify which graph vertex is being examined, the query can either use a variable named ?input or use a different variable and specify its name using sna:hasInput.

sna:objectsOf predicate(s)

Starting from a graph vertex as a subject, define its neighbors as the objects of the triples with the given predicate(s). Two forms are possible:

sna:objectsOf example:onePredicate .

sna:objectsOf (example:predicate1 example:predicate2 ...)

sna:subjectsOf predicate(s)

like sna:objectsOf only start from an object and define its neighbors as the subjects of triples with the given predicate(s). For example:

[ a sna:Generator ;  
    sna:hasName ex:knowsOrHeardOfS ;  
    sna:subjectsOf (<http://www.franz.com/sna#knows> <http://www.franz.com/sna#heardOf>) ] .

sna:undirected predicate(s)

This combines sna:subjectsOf and sna:objectsOf in that it will define neighbors as the union of the subjects and objects of the given predicate(s).

sna:hasSelect

Define neighbors using a Prolog Select query. The query must return a single variable binding and should use the (?? node) syntax to specify the starting graph vertex. For example,

(select ?person2  
  (q ?article !dc:creator (?? node))  
  (q ?article !dc:creator ?person2)  
  (lispp (not (upi= node ?person2))))

The query will be read into the current environment so using namespace abbreviations is not recommended.

Neighbors

Use sna:nodalNeighbors to iterate over the neighbors of a node (as determined by a generator). For example:

?neighbor sna:nodalNeighbors (sna:coCreators ?start) .

would bind ?neighbor to each vertex that is adjacent to ?start.

Groups and Centrality Measures

Many of the SNA functions are defined in terms of nodes and groups. You can specify a group in a SPARQL query using either the BIND form or the Magic Property form. Both forms require a generator, a starting node and a depth. These next two queries are equivalent.

# Find the size of Erdoes's social network out to a depth of 2  
# using the BIND form.  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?size {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  BIND( sna:egoGroup( sna:coCreators, ?s, 2 ) as ?group )   
  ?group sna:size ?size .  
}  
 
# Find the size of Erdoes's social network out to a depth of 2  
# using the magic property form.  
prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?size {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  Bgroup sna:egoGroup (ex:coCreators ?s 2)  
  ?group sna:size ?size .  
}

These groups act like blank nodes and have no meaning outside of a given query ¹ . Within a query, however, we can use other Magic Properties to examine the group ² . For example, we can get actor degree centrality for each actor in an ego group by building a group with the sna:egoGroup function and then using sna:actorDegreeCentrality.

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?centrality {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?group sna:egoGroup (sna:coCreators ?s 3) .  
  (?actor ?centrality) sna:actorDegreeCentrality (sna:coCreators ?group) .  
}

This will compute the actor degree centrality for each member in Erdoes's ego group. If we only wanted the centrality for a single actor, we could have used something like this:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>   
prefix foaf: <http://xmlns.com/foaf/0.1/>   
select ?actor ?centrality {   
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .   
  ?group sna:egoGroup (sna:coCreators ?s 3) .   
  ?actor foaf:name 'Paul Thinkle'^^xsd:string .   
  (?actor ?centrality) sna:actorDegreeCentrality (sna:coCreators ?group) .  
}

I.e., we first find the blank node corresponding to Erdoes; then we find the group that surrounds that blank node; then we find the blank node that corresponds to Paul Thinkle; and finally, we find centrality. The binding on ?actor means that we only find a single centrality measure.

Use sna:members to iterate over the members of a group:

?actor sna:members ?group .

and the group graph density with

?density sna:groupDensity (<generator> <group>)

The group centrality measures are similar: given a group, we'd get the group degree centrality with

?centrality sna:groupDegreeCentrality (<generator> <group>)

The following centrality measures are defined:

sna:actorDegreeCentrality
sna:actorClosenessCentrality
sna:actorBetweennessCentrality
sna:groupDegreeCentrality
sna:groupClosenessCentrality
sna:groupBetweennessCentrality

You can find the size of a group by using sna:size as in

?group sna:size ?size

Neighbor Caches

Because computing some measures can be quite expensive, the SNA library provides a caching mechanism to save information about nodal neighbors. The SPARQL SNA extensions call these neighbor caches. As with ego groups, you can create a cache using either the bind form or the Magic Property form:

BIND( sna:neighborCache( <generator> <starting points> <depth> ) as ?cache )  
 
?cache sna:neighborCache( <generator> <starting points> <depth> ) .

starting points can be a node or a group.

Once we have the cache, we can use it wherever we'd use a generator (or a group). For example, here is a query that computes closeness centrality for each actor using the generator:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?c {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?group sna:egoGroup (sna:coCreators ?s 1) .  
  (?actor ?c) sna:actorClosenessCentrality (sna:coCreators ?group) .  
}

and here is the same query using the neighbor cache:

prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
prefix foaf: <http://xmlns.com/foaf/0.1/>  
select ?actor ?c {  
  ?s foaf:name 'Paul Erdoes'^^<http://www.w3.org/2001/XMLSchema#string> .  
  ?cache sna:neighborCache (sna:coCreators ?s 1) .  
  (?actor ?c) sna:actorClosenessCentrality (?cache ?cache) .  
}

Since the centrality measure needs both a generator and a group, we use the cache twice. This second form can be significantly faster. Note that neighbor caches are themselves cached between queries.

Paths

A path is an ordered sequence of nodes starting at node1 and ending at node2 such that each node is the neighbor (as defined by a generator) of its predecessor. AllegroGraph provides three primitive path finding operations: breadth-first, depth-first and bidirectional. Each of these has a corresponding Magic Property. For example, this pattern will succeed if a path exists between ex:llama and ex:caribou:

ex:llama sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .

It will use the depth first search strategy.

Examining the paths between two nodes is more complicated because SPARQL can only bind variables to single values (literals or resources) and a path consists of multiple ordered values. To accommodate this, we introduce path identifiers and node indexes. The first represents a single path and the second is a typed literal that represents the index of the node in its path.

As an example, suppose we start with this graph

  /--- b ---\  
a            c  
  \--- d ---/

We can ask for all of the paths between <a> and <c> using

(<a> ?vertex ?linkNumber ?path) sna:depthFirstSearch (ex:generator <c>) .

AllegroGraph will find two paths: (a b c) and (a d c). It will represent these as:

?vertex    ?linkNumber ?path  
=================================  
  a        0           0  
  b        1           0  
  c        2           0  
  a        0           1  
  d        1           1  
  c        2           1

That is, it will return bindings for the three variables such that ?path will have one value for (a b c) and another value for the (a d c). Within each path, ?linkNumber will index the vertexes and ?vertex will actually be bound to each node along the way.

If the ?path is left off, then AllegroGraph will return only the first path that it finds. If you do not need to know the order of the vertexes in the path, then ?linkNumber can also be left off. So for example, these two queries will find a single path and return some information about it:

(ex:llama ?vertex) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .  
 
(ex:llama ?vertex ?order) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .

and this will return all paths (as described in the table above):

(ex:llama ?vertex ?order ?path) sna:depthFirstSearch (ex:animalNeighborGenerator ex:caribou) .

Sometimes, it is useful to be able to examine each path in turn. The Magic Properties sna:depthFirstSearchPaths, sna:breadthFirstSearchPaths, and sna:bidirectionalSearchPaths iterate over paths between two nodes. For example, this pattern will bind ?path to an different identifier for each path between ex:llama and ex:caribou:

(ex:llama ?path) sna:depthFirstSearchPaths (ex:animalNeighborGenerator ex:caribou) .

The path identifiers can then be used with other Magic Properties. For example, sna:members iterates over the vertexes of a path:

# first form  
?vertex sna:members ?path .  
 
# second form  
(?vertex ?order) sna:members ?path .

and sna:size returns the length of a path:

?path sna:size ?length .

Note that path identifiers have meaning only within the query execution and should not be projected.

Using the same graph (a,b,c,d) graph from above, this query

(<a> ?path) sna:depthFirstSearchPaths (ex:generator <c>) .  
(?vertex ?order) sna:vertexOf ?path .

would return something very much like the sna:depthFirstSearch query did above:

?vertex       ?order            ?path  
===================================  
      a           0            _:g0  
      b           1            _:g0  
      c           2            _:g0  
      a           0            _:g1  
      d           1            _:g1  
      c           2            _:g1

But this second form also allows us to compute things like the average path length:

select (avg(?length) as ?avgLength {  
  (<a> ?path) sna:depthFirstSearchPaths (ex:generator <c>) .  
  ?path sna:size ?length .  
}

Cliques

We can tell if a group is a clique with

?isClique sna:isClique (<generator> <group>) .

We can find the cliques with

?clique sna:cliquesOf (<generator> <actor> ) .  
?clique sna:cliquesOf (<generator> <actor> <minimum-size> ) .

?clique will be bound to a group identifier for each clique found. As mentioned above, this identifier makes sense only within the query and it should not be projected. It can, of course, be used by other SNA function and properties.

?clique sna:cliquesOf (<generator> <actor> <minimum-size> ) .  
?member sna:member ?clique .

The SNA Selector Interface

An SNA selector selects one or more RDF predicates as the edges of a conventional graph. For each selected predicate, the subject and the object will be the corresponding connected vertices. This term directly maps to a conventional graph very well.

To define a selector, it’s very similar to how we define a generator (see above. For example, consider the Zachary’s karate club dataset (the link is to the Wikipedia descripion of that dataset and it can be downloaded from the Franz website here https://franz.com/ftp/pub/agraph/datasets/karate_club.nt), where there’s only one type of edge ex:knows:

PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
PREFIX ex: <ex://karate#>  
 
DELETE { GRAPH sna:sna { ?id ?p ?o } }  
WHERE {  
  GRAPH sna:sna {  
?id a sna:Selector ;  
sna:hasName ex:karateSelector ;  
?p ?o .  
  }  
} ;  
 
INSERT DATA {  
  GRAPH sna:sna {  
[  
  a sna:Selector ;  
  sna:hasName ex:karateSelector ;  
  sna:undirected ex:knows ;  
]  
  }  
}

Note that sna:undirected has been used to select the types of edges we want. In the above example, any subject and object matched in the triples pattern e.g. ex:foo ex:knows ex:bar will be considered as vertices in an undirected graph.

If more edge types are needed, sna:undirected can take a list of them e.g.

sna:undirected (ex:edge1 ex:edge2 ex:edge3)

Correspondingly, there is also sna:directed, the only difference is the selector will consider the graph directed instead of undirected.

At the moment, mixing both sna:directed and sna:undirected in the same selector definition is not supported.

Leiden Community Detection

This magic property was added in release 8.3.0. Usually the name of the magic property contains the release in which they were added but because most SNA magic properties were adding in release 4.11, the name for this one is

http://franz.com/ns/allegrograph/4.11/sna/communityLeiden

so the same PREFIX can be used for all SNA magic properties.

The new magic property sna:communityLeiden that implements leiden community detection, takes a list of keyword arguments:

Name	Type	Default	Optional	Description
`generator`	Resource	`nil`	yes	A defined SNA generator
`actor`	Resource	`nil`	yes	A start node (actor) for traversing the RDF graph by the generator
`depth`	Numeric or nil	`nil`	yes	The depth for making an ego group. Must not be `nil` if generator is non- `nil`. See note 1 following the table.
`selector`	Resource	`nil`	yes	A defined SNA selector
`resolution`	Numeric	`1.0`	yes	A parameter that is used when computing the modularity
`beta`	Numeric	`0.01`	yes	A parameter that controls the randomness while breaking a community into smaller ones
`iterations`	Numeric	`-1`	yes	A parameter that controls the total iterations. -1 means iterate until modularity cannot be improved further.
`seed`	Numeric or nil	`nil`	yes	A parameter that sets the random generator’s seed to have re-producible communities

Note 1 The depth argument can only be nil when the generator argument is not specified or nil. Otherwise, depth must be a non-negative integer and will be used along with the generator and actor for constructing an SNA ego group.

For the Karate Club dataset (introduced above and downloadable from https://franz.com/ftp/pub/agraph/datasets/karate_club.nt), if the selector has been defined as given in the selector section, one can quickly perform the leiden community detection like this:

PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
PREFIX : <http://franz.com/ns/keyword#>  
 
PREFIX ex: <ex://karate#>  
 
SELECT ?actor ?group WHERE {  
  (?group ?groupId) sna:communityLeiden (:selector ex:karateSelector) .  
  ?actor sna:members ?group .  
} ORDER BY ?group

sna:communityLeiden returns a list of communities (as ?group, which is actually a blank node) as well as each of their ID (as ?groupId, which is a serial integer number). The detected communities are compatible with other SNA Ego Group operations, including centrality measures. For example:

PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>  
PREFIX : <http://franz.com/ns/keyword#>  
PREFIX ex: <ex://karate#>  
SELECT ?group ?actor ?centralityD ?size WHERE {  
  {SELECT ?group ?size WHERE{  
(?group ?groupId) sna:communityLeiden (:selector ex:karateSelector :seed 1000) .  
?group sna:size ?size .  
   } ORDER BY DESC (?size) LIMIT 1  
  }  
  ?actor sna:members ?group .  
  (?actor ?centralityD) sna:actorDegreeCentrality (ex:karateSelector ?group) .  
} ORDER BY DESC(?centralityD)

Note that if a selector is applied, sna:communityLeiden currenly only supports undirected graphs.

Footnotes

SNA Groups, caches and path identifiers will all serialize as if they were blank nodes with the same blank node identifier ↩
Note that group members are cached between queries when possible to make the SNA functions operate more efficiently. ↩

SPARQL Social Network Analysis (SNA) Magic Properties