AllegroGraph Social Network Analysis (SNA) Tutorial
=====================================================

AllegroGraph provides SNA magic properties in the sna: namespace
(http://franz.com/ns/allegrograph/4.11/sna/) for graph analytics directly
in SPARQL. All SNA operations work through magic properties — no special
APIs needed. You already have sparql_query and add_triples tools.

Standard prefixes used throughout:
  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX kw: <http://franz.com/ns/keyword#>

COPY-PASTE SYNTAX REFERENCE (use these EXACTLY — do NOT invent property names):

Neighbors:
  ?neighbor sna:nodalNeighbors (<generator> <actor>)

Ego group (required for centrality and cliques):
  ?group sna:egoGroup (<generator> <actor> <depth>)
  ?group sna:size ?size
  ?member sna:members ?group

Centrality (requires ego group first):
  (?actor ?centrality) sna:actorDegreeCentrality (<generator> ?group)
  (?actor ?centrality) sna:actorBetweennessCentrality (<generator> ?group)
  (?actor ?centrality) sna:actorClosenessCentrality (<generator> ?group)
  (?actor ?centrality) sna:pageRankCentrality (<generator> ?group)
  Group-level: sna:groupDegreeCentrality, sna:groupBetweennessCentrality, sna:groupClosenessCentrality

Cliques:
  ?clique sna:cliquesOf (<generator> <actor>)
  ?result sna:isClique (<generator> <node1> <node2> ...)

COMMUNITY DETECTION — DIFFERENT SYNTAX! Uses kw: prefix, builds its own ego group:
  PREFIX kw: <http://franz.com/ns/keyword#>
  (?community ?communityId) sna:communityLeiden
    (kw:generator <name> kw:actor <actor> kw:depth <depth>)
  ?member sna:members ?community

  WRONG: sna:communityLeiden (:coAuthors ?group) — NO positional args!
  WRONG: sna:communityLeiden (kw:generator :coAuthors kw:group ?group) — "group" is NOT a valid keyword!
  RIGHT: sna:communityLeiden (kw:generator :coAuthors kw:actor :author1 kw:depth 10)

Path finding — two styles:

  Style 1: "SearchPaths" — returns path OBJECTS, use sna:members/sna:size to inspect:
    (<start> ?path) sna:bidirectionalSearchPaths (<generator> <end>)
    (<start> ?path) sna:breadthFirstSearchPaths (<generator> <end>)
    (<start> ?path) sna:depthFirstSearchPaths (<generator> <end>)
    ?path sna:size ?pathLen
    ?vertex sna:members ?path

  Style 2: "Search" — returns nodes DIRECTLY in the subject tuple:
    ?start sna:bidirectionalSearch (<generator> <end>)          — test existence
    (?start ?node) sna:bidirectionalSearch (<generator> <end>)  — path nodes
    (?start ?node ?nodeId) sna:bidirectionalSearch (...)        — nodes + position
    (?start ?node ?nodeId ?pathId) sna:bidirectionalSearch (...) — + path ID
    Same forms for: sna:breadthFirstSearch, sna:depthFirstSearch

Neighbor cache (performance — replaces both generator AND group):
  ?cache sna:neighborCache (<generator> <actor> <depth>)
  Use ?cache in place of both <generator> and ?group, e.g.:
  (?actor ?centrality) sna:actorClosenessCentrality (?cache ?cache)

Group utilities:
  ?member sna:groupMember ?group   — alternative to sna:members

COMPLETE list of valid sna: property names (do NOT invent others):
  nodalNeighbors, egoGroup, members, size, groupDensity, groupMember,
  neighborCache, actorDegreeCentrality, actorBetweennessCentrality,
  actorClosenessCentrality, pageRankCentrality, groupDegreeCentrality,
  groupBetweennessCentrality, groupClosenessCentrality, cliquesOf, isClique,
  communityLeiden, bidirectionalSearch, bidirectionalSearchPaths,
  breadthFirstSearch, breadthFirstSearchPaths, depthFirstSearch,
  depthFirstSearchPaths


1. SETUP: GENERATORS
====================

A generator defines how nodes connect (neighbor relationships). It must be
stored as RDF triples in the special named graph <http://franz.com/ns/allegrograph/4.11/sna/sna>.

Use the add_triples tool with:
  format: "turtle"
  context: "<http://franz.com/ns/allegrograph/4.11/sna/sna>"

PREFER predicate-based generators (compiled to efficient Lisp — no SPARQL parsing).
Only use SPARQL generators when you need complex join patterns.

1a. Predicate-based generators (PREFERRED)
------------------------------------------

AllegroGraph compiles predicate-based generators into efficient Lisp functions,
while SPARQL generators must be parsed and executed each time. Use predicate-based
generators whenever the relationship is a direct predicate or set of predicates.

  sna:undirected — neighbors are both subjects and objects (undirected):
  [ a sna:Generator ;
    sna:hasName :colleagues ;
    sna:undirected foaf:knows ] .

  Multiple predicates — use a list (common for multi-relation datasets):
  [ a sna:Generator ;
    sna:hasName :discussedTogether ;
    sna:undirected ( <http://www.franz.com/discusses-diseases>
                     <http://www.franz.com/discusses-drug>
                     <http://www.franz.com/discusses-side-effect>
                     <http://www.franz.com/discusses-target> ) ] .

  sna:objectsOf — neighbors are objects of the predicate (directed, forward):
  [ a sna:Generator ;
    sna:hasName :managedBy ;
    sna:objectsOf :reportsTo ] .

  sna:subjectsOf — neighbors are subjects of the predicate (directed, reverse):
  [ a sna:Generator ;
    sna:hasName :manages ;
    sna:subjectsOf :reportsTo ] .

1b. SPARQL generators (fallback for complex patterns)
-----------------------------------------------------

Use SPARQL generators ONLY when the relationship requires joining through
intermediate nodes (e.g., co-authorship where two authors share a book,
but you want to skip the book node and connect authors directly).

CRITICAL: SPARQL generators use ?input and ?output variables (NOT ??).
  - ?input = the node whose neighbors we want
  - ?output = a neighbor of ?input

Example — co-authorship (joins through intermediate book node):

  @prefix sna: <http://franz.com/ns/allegrograph/4.11/sna/> .
  @prefix dc: <http://purl.org/dc/elements/1.1/> .
  @prefix : <http://example.org/sna#> .

  [ a sna:Generator ;
    sna:hasName :coAuthors ;
    sna:hasSPARQL """PREFIX dc: <http://purl.org/dc/elements/1.1/>
  SELECT DISTINCT ?output {
    ?book dc:creator ?input .
    ?book dc:creator ?output .
    FILTER(?input != ?output)
  }""" ] .

DECISION GUIDE: If the relationship is a direct predicate or set of predicates
between the nodes you care about → use predicate-based. If you need joins,
filters, or to skip intermediate nodes → use SPARQL.

To check if a generator exists:
  SELECT * WHERE {
    GRAPH <http://franz.com/ns/allegrograph/4.11/sna/sna> {
      ?gen a sna:Generator ;
           sna:hasName ?name .
    }
  }

To delete a generator:
  DELETE WHERE {
    GRAPH <http://franz.com/ns/allegrograph/4.11/sna/sna> { ?s ?p ?o }
  }


2. NODAL NEIGHBORS
==================

Find direct neighbors of a node via the generator.

Syntax:
  ?neighbor sna:nodalNeighbors (<generatorName> <startNode>)

Example — find co-authors of :author1:

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?name WHERE {
    ?coauthor sna:nodalNeighbors (:coAuthors :author1) .
    ?coauthor foaf:name ?name .
  }


3. EGO GROUPS
=============

An ego group expands the neighborhood of a node to a given depth.
Depth 1 = direct neighbors, depth 2 = neighbors of neighbors, etc.

Syntax:
  ?group sna:egoGroup (<generatorName> <actor> <depth>)

Group properties:
  ?group sna:size ?size          — number of members
  ?member sna:members ?group     — iterate members
  ?group sna:groupDensity ?d     — density (0.0 to 1.0)

Example — ego group of :author1 at depth 2:

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?size WHERE {
    ?group sna:egoGroup (:coAuthors :author1 2) .
    ?group sna:size ?size .
  }

Example — list members of ego group:

  SELECT ?name WHERE {
    ?group sna:egoGroup (:coAuthors :author1 2) .
    ?member sna:members ?group .
    ?member foaf:name ?name .
  }

TIP: Use a large depth (e.g. 10) to capture the entire connected component.


3a. NEIGHBOR CACHES (PERFORMANCE OPTIMIZATION)
===============================================

Computing centrality, cliques, and community detection can be expensive.
Neighbor caches pre-compute and cache the neighbor information, and can
replace BOTH the generator and the group in subsequent operations.

Syntax (same arguments as egoGroup):
  ?cache sna:neighborCache (<generatorName> <actor> <depth>)

The cache can be used wherever you'd use a generator OR a group. Since
centrality needs both a generator and a group, pass the cache for BOTH:

  (?actor ?centrality) sna:actorClosenessCentrality (?cache ?cache)

Example — closeness centrality WITHOUT cache (standard approach):

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?name ?centrality WHERE {
    ?group sna:egoGroup (:coAuthors :author1 10) .
    (?actor ?centrality) sna:actorClosenessCentrality (:coAuthors ?group) .
    ?actor foaf:name ?name .
  } ORDER BY DESC(?centrality) LIMIT 10

Example — same query WITH cache (significantly faster):

  SELECT ?name ?centrality WHERE {
    ?cache sna:neighborCache (:coAuthors :author1 10) .
    (?actor ?centrality) sna:actorClosenessCentrality (?cache ?cache) .
    ?actor foaf:name ?name .
  } ORDER BY DESC(?centrality) LIMIT 10

Neighbor caches are themselves cached between queries, so repeated
analysis on the same network is fast. Use caches whenever running
multiple SNA operations on the same network.


4. CENTRALITY
=============

Centrality measures require a group (typically an ego group) or a neighbor
cache. All centrality magic properties use the same pattern:

  (?actor ?centrality) sna:<centralityType> (<generatorName> ?group)
  (?actor ?centrality) sna:<centralityType> (?cache ?cache)    — with neighbor cache

Create the group first, then compute centrality over it.

4a. Degree Centrality — who has the most connections
----------------------------------------------------

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?name ?centrality WHERE {
    ?group sna:egoGroup (:coAuthors :author1 10) .
    (?actor ?centrality) sna:actorDegreeCentrality (:coAuthors ?group) .
    ?actor foaf:name ?name .
  } ORDER BY DESC(?centrality) LIMIT 10

4b. Betweenness Centrality — who bridges communities
-----------------------------------------------------

  SELECT ?name ?centrality WHERE {
    ?group sna:egoGroup (:coAuthors :author1 10) .
    (?actor ?centrality) sna:actorBetweennessCentrality (:coAuthors ?group) .
    ?actor foaf:name ?name .
  } ORDER BY DESC(?centrality) LIMIT 10

4c. Closeness Centrality — who can reach others fastest
--------------------------------------------------------

  SELECT ?name ?centrality WHERE {
    ?group sna:egoGroup (:coAuthors :author1 10) .
    (?actor ?centrality) sna:actorClosenessCentrality (:coAuthors ?group) .
    ?actor foaf:name ?name .
  } ORDER BY DESC(?centrality) LIMIT 10

Group-level centrality (single value for the whole group):
  ?centrality sna:groupDegreeCentrality (<generatorName> ?group)
  ?centrality sna:groupBetweennessCentrality (<generatorName> ?group)
  ?centrality sna:groupClosenessCentrality (<generatorName> ?group)


5. CLIQUES
==========

A clique is a fully-connected subgroup (every member is connected to every
other member).

5a. Find cliques containing a node:

  ?clique sna:cliquesOf (<generatorName> <actor>)

Example:

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?cliqueSize (GROUP_CONCAT(?name; separator=', ') AS ?members) WHERE {
    ?clique sna:cliquesOf (:coAuthors :author1) .
    ?clique sna:size ?cliqueSize .
    ?member sna:members ?clique .
    ?member foaf:name ?name .
  } GROUP BY ?clique ?cliqueSize ORDER BY DESC(?cliqueSize)

5b. Test if a set of nodes forms a clique:

  ?result sna:isClique (<generatorName> <node1> <node2> <node3> ...)

Returns true/false.


6. COMMUNITY DETECTION (LEIDEN ALGORITHM)
=========================================

CRITICAL: communityLeiden is COMPLETELY DIFFERENT from other SNA properties!
  - It uses keyword arguments with the kw: prefix (NOT positional arguments)
  - It builds its OWN ego group internally — do NOT create a separate ego group
  - ALL arguments must use the kw: prefix
  - The ONLY valid keyword names are: generator, actor, depth, selector,
    resolution, beta, iterations, seed

  PREFIX kw: <http://franz.com/ns/keyword#>

Syntax (copy this EXACTLY, only changing the generator name, actor, and depth):
  (?community ?communityId) sna:communityLeiden (kw:generator <name> kw:actor <node> kw:depth <n>)

WRONG — do NOT create a separate ego group and pass it:
  ?group sna:egoGroup (:coAuthors :author1 10) .
  (?community ?id) sna:communityLeiden (kw:generator :coAuthors kw:group ?group) .
  -- "group" is NOT a valid keyword! Neither are "members", "actors", "ego-group", etc.

WRONG — do NOT use positional arguments:
  (?community ?id) sna:communityLeiden (:coAuthors ?group) .
  -- Positional arguments cause "expects at most 0 positional object arguments" error

RIGHT — pass generator, actor, and depth directly (Leiden builds the ego group itself):
  (?community ?communityId) sna:communityLeiden
    (kw:generator :coAuthors kw:actor :author1 kw:depth 10) .

Optional keyword arguments:
  kw:resolution — controls community granularity (number, default 1.0)
                   higher = more smaller communities
  kw:seed       — random seed for reproducibility (number)

Example — detect communities in the co-authorship network:

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  PREFIX kw: <http://franz.com/ns/keyword#>

  SELECT ?communityId
         (GROUP_CONCAT(?name; separator=', ') AS ?members)
         (COUNT(?actor) AS ?size)
  WHERE {
    (?community ?communityId) sna:communityLeiden
      (kw:generator :coAuthors kw:actor :author1 kw:depth 10) .
    ?actor sna:members ?community .
    ?actor foaf:name ?name .
  } GROUP BY ?communityId ORDER BY DESC(?size)


7. PATH FINDING
===============

Find paths between two nodes in the graph. Two styles available:

  "SearchPaths" variants return path OBJECTS — use sna:members and sna:size to inspect.
  "Search" variants return nodes DIRECTLY in the subject tuple.

Available algorithms: bidirectional (recommended), breadthFirst, depthFirst.

7a. SearchPaths style (RECOMMENDED — clean, works well with GROUP_CONCAT):

  (<startNode> ?path) sna:bidirectionalSearchPaths (<generatorName> <endNode>)

  Path properties:
    ?path sna:size ?pathLen      — number of hops
    ?vertex sna:members ?path    — iterate path vertices (unordered)

  Example — shortest paths between two authors:

  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>
  PREFIX : <http://example.org/sna#>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

  SELECT ?pathLen
         (GROUP_CONCAT(?name; separator=' -> ') AS ?route)
  WHERE {
    (:author1 ?path) sna:bidirectionalSearchPaths (:coAuthors :author20) .
    ?path sna:size ?pathLen .
    ?vertex sna:members ?path .
    ?vertex foaf:name ?name .
  } GROUP BY ?path ?pathLen ORDER BY ?pathLen

  Also available: sna:breadthFirstSearchPaths, sna:depthFirstSearchPaths

7b. Search style (nodes returned directly — good for ordered traversal):

  Form 1 — test existence:
    ?start sna:bidirectionalSearch (<generatorName> <endNode>)

  Form 2 — path nodes (unordered):
    (?start ?node) sna:bidirectionalSearch (<generatorName> <endNode>)

  Form 3 — nodes with sequential position:
    (?start ?node ?nodeId) sna:bidirectionalSearch (<generatorName> <endNode>)

  Form 4 — nodes with position AND path ID (for multiple paths):
    (?start ?node ?nodeId ?pathId) sna:bidirectionalSearch (<generatorName> <endNode>)

  Same forms available for: sna:breadthFirstSearch, sna:depthFirstSearch
  WARNING: DFS can return hundreds of paths. Use LIMIT or prefer bidirectional.

  Example — Search style with ordered nodes:

  SELECT ?name ?nodeId WHERE {
    (:author1 ?node ?nodeId) sna:bidirectionalSearch
      (:coAuthors :author20) .
    ?node foaf:name ?name .
  } ORDER BY ?nodeId


8. GOTCHAS AND COMMON MISTAKES
===============================

1. GENERATOR VARIABLES: Use ?input/?output, NOT ??
   WRONG:  SELECT ?neighbor { ?book dc:creator ?? . ?book dc:creator ?neighbor }
   RIGHT:  SELECT DISTINCT ?output { ?book dc:creator ?input . ?book dc:creator ?output . FILTER(?input != ?output) }

2. GENERATOR GRAPH: Generators MUST be in the sna:sna named graph.
   Use add_triples with context="http://franz.com/ns/allegrograph/4.11/sna/sna"

3. LEIDEN IS COMPLETELY DIFFERENT FROM OTHER SNA PROPERTIES:
   - ALL arguments must use kw: prefix (not positional)
   - It builds its OWN ego group — do NOT pass a pre-built ?group variable
   - Valid keywords ONLY: generator, actor, depth, selector, resolution, seed
   - Do NOT invent keywords like "group", "members", "actors", "ego-group"
   WRONG:  sna:communityLeiden (:coAuthors ?group)
   WRONG:  sna:communityLeiden (kw:generator :coAuthors kw:group ?group)
   RIGHT:  sna:communityLeiden (kw:generator :coAuthors kw:actor :author1 kw:depth 10)

4. PATH MEMBER ORDERING: (?vertex ?order) sna:members ?path does NOT work
   for ordered traversal. Use unordered: ?vertex sna:members ?path
   For ordered vertices, use depthFirstSearch which provides ?linkNumber.

5. DFS VERBOSITY: depthFirstSearch returns ALL paths, not just shortest.
   Prefer bidirectionalSearchPaths for shortest path queries.

6. CENTRALITY NEEDS A GROUP: You cannot compute centrality on the whole
   repository. First create an ego group (use large depth like 10 for
   the full connected component), then pass it to centrality.

7. INCLUDE PREFIXES IN GENERATOR SPARQL: The SPARQL inside sna:hasSPARQL
   needs its own PREFIX declarations — it does not inherit from the outer
   query.


9. RECOMMENDED WORKFLOW
========================

When a user asks for social network analysis:

Step 1: Check if a generator exists
  SELECT * WHERE {
    GRAPH <http://franz.com/ns/allegrograph/4.11/sna/sna> {
      ?gen sna:hasName ?name .
    }
  }
  PREFIX sna: <http://franz.com/ns/allegrograph/4.11/sna/>

Step 2: If no generator, examine the data schema (get_shacl) and create one
  - Identify the relationship predicates in the schema
  - FIRST consider predicate-based generators (preferred, compiled to Lisp):
    * Direct predicate between nodes → sna:undirected, sna:objectsOf, or sna:subjectsOf
    * Multiple relationship predicates → sna:undirected with a list
  - ONLY use SPARQL generators (sna:hasSPARQL) when the connection requires
    joining through intermediate nodes (e.g., co-authorship via shared book)
  - Load via add_triples into the sna:sna graph

Step 3: Run the analysis
  - Start with nodalNeighbors to verify the generator works
  - For centrality or cliques: use a neighbor cache (sna:neighborCache) instead
    of a separate ego group — it's significantly faster and replaces both the
    generator and group arguments: (?cache ?cache)
  - For simple neighbor listing or path finding: ego group is fine, no cache needed
  - Community detection (Leiden): builds its own ego group, no cache needed

Step 4: Present results clearly
  - Use GROUP_CONCAT for readable output
  - ORDER BY DESC for rankings
  - LIMIT for manageable result sets
