AllegroGraph Python Sesame API Tutorial

This is an introduction to the Python client API to the AllegroGraph RDFStore™ from Franz Inc.

The Python Sesame API offers convenient and efficient access to an AllegroGraph server from a Python-based application. This API provides methods for creating, querying and maintaining RDF data, and for managing the stored triples.

The Python Sesame API deliberately emulates the Aduna Sesame API to make it easier to migrate from Sesame to AllegroGraph.  The Python Sesame API has also been extended in ways that make it easier and more intuitive than the Sesame API. 

Contents

 

 

Overview   Return to Top

The Python client tutorial rests on a simple architecture consisting of the following components:

Each lesson in tutorial_examples.py is encapsulated in a Python function, named exampleN(), where N ranges from 0 to 24. The function names are referenced in the title of each section to make it easier to compare the tutorial text and the code of the examples file.

Prerequisites   Return to Top

The tutorial examples can be run on any system which supports Python 2.6 or 2.7. The tutorial assumes that AllegroGraph and Python 2.6 or 2.7 have been installed and configured using the procedure posted on this page.

Setting the environment for the tutorial   Return to Top

You should set the following environment variables when running the Python tutorial.

The environment variables are:

Here is an example of passing environment variables to the tutorial:

# connect to agraph.my-domain.com:10443
env AGRAPH_HOST=agraph.my-domain.com AGRAPH_PORT=10443 python tutorial/tutorial_examples.py all

The command window will fill with output from the example functions. The examples run for about 2.5 minutes and touch most of AllegroGraph's features. To run a specific example, enter its number after the file name, that is (for example 13):

env AGRAPH_HOST=agraph.my-domain.com AGRAPH_PORT=10443 python tutorial/tutorial_examples.py 13

The agraph.cfg (see the Server Installation document) specifies values like the port used by the server and the catalogs defined. Refer to that file if necessary (or ask your system administrator if you do not have access to the file).

Troubleshooting

If the tutorial examples throw a connection error in example6(), see the discussion of Session Port Setup in the Server Installation document.

If you see an error similar to the following

ImportError: pycurl: libcurl link-time ssl backend (nss) is
different from compile-time ssl backend (none/other)

Perform this procedure (replacing {agraph-version} with the actual version):

# Uninstall pycurl from the virtual environment:
pip uninstall pycurl

# Set the required compile-time option for pycurl
export PYCURL_SSL_LIBRARY=nss

# Reinstall, but ignore cached packages (force recompile)
pip install --no-cache-dir -e /yourPathway/agraph-{agraph-version}-client-python

Terminology   Return to Top

We need to clarify some terminology before proceeding.

In the context of AllegroGraph Server:

  • A "catalog" is a list of repositories owned by an AllegroGraph server.
  • A "repository" is a collection of triples within a Catalog, stored and indexed on a hard disk.
  • A "context" is a subgraph of the triples in a repository.
  • If contexts are not in use, the triples are stored in the background (default) graph.

Creating Users with WebView    Return to Top

Each connection to an AllegroGraph server runs under the credentials of a registered AllegroGraph user account.

Initial Superuser Account

The installation instructions for AllegroGraph advise you to create a superuser called "test", with password "xyzzy". This is the user (and password) which are used by the Python tutorial if the environment variables AGRAPH_USER and AGRAPH_PASSWORD are not set.

Users, Permissions, Access Rules, and Roles

AllegroGraph user accounts may be given any combination of the following three permissions:

In addition, a user account may be given read, write or read/write access to individual repositories.

It is easiest to run the Python tutorial as a Superuser. That way you do not have to worry about permissions.

If you run as a non-Superuser, you need Start Session permission for example 6 and other examples that depend on 6. You need Evaluate Arbritray Code permission for example17. You also need read/write access on the appropriate catalogs and repositories.

You can also define a role (such as "librarian") and give the role a set of permissions and access rules. Then you can assign several users to a shared role. This lets you manage their permissions and access by editing the role instead of the individual user accounts.

A superuser automatically has all possible permissions and unlimited access. A superuser can also create, manage and delete other user accounts. Non-superusers cannot view or edit account settings.

A user with the Start Sessions permission can use the AllegroGraph features that require spawning a dedicated session, such as Transactions and Social Network Analysis. If you try to use these features without the appropriate permission, you'll encounter authentication errors.

A user with permission to Evaluate Arbitrary Code can run Prolog Rule Queries. This user can also do anything else that allows executing Lisp code, such as defining select-style generators, or doing eval-in-server, as well as loading server-side files.

WebView

WebView is AllegroGraph's HTTP-based graphical user interface for user and repository management. It provides a SPARQL endpoint for querying your triple stores as well as various tools that let you create and maintain triple stores interactively.

To connect to WebView, simply direct your Web browser to the AllegroGraph port of your server. If you have installed AllegroGraph locally (and used the default port number), use:

http://localhost:10035

You will be asked to log in. Use the superuser credentials described in the previous section.

The first page of WebView is a summary of your catalogs, repositories, and federations. Click the user account link in the lower left corner of the page. This exposes the Users and Roles page.

This is the environment for creating and managing user accounts.

To create a new user, click the [add a user] link. This exposes a small form where you can enter the username (one symbol) and password. Click OK to save the new account.

The new user will appear in the list of users. Click the [view permissions] link to open a control panel for the new user account:

Use the checkboxes to apply permissions to this account (superuser, start session, evaluate arbitrary code).

It is imporant that you set up access permissions for the new user. Use the form to create an access rule by selecting read, write or read/write access, naming a catalog (or * for all), and naming a repository within that catalog (or * for all). Click the [add] link. This creates an access rule for your new user. The access rule will appear in the permissions display:

This new user can log in and perform transactions on any repository in the system.

To repeat, the "test" superuser is all you need to run all of the tutorial examples. This section is for the day when you want to issue more modest credentials to some of your operators.

Running Python Tutorial Examples    Return to Top

The AllegroGraph Python Tutorial examples are in the tutorial subdirectory where you unpacked the Python client tar.gz file. Navigate to that directory and follow one of these examples:

     $ python tutorial_examples.py            runs all tests.
     $ python tutorial_examples.py all        runs all tests.
     $ python tutorial_examples.py 10         runs example10()
     $ python tutorial_examples.py 1 5 22     runs examples 1, 5, and 22

Creating a Repository and Triple Indices (example1)    Return to Top

The first task is to attach to our AllegroGraph Server and open a repository. This task is implemented in example1() from tutorial_examples.py.

In example1() we build a chain of Python objects, ending in a"connection" object that lets us manipulate triples in a specific repository. The overall process of generating the connection object follows this diagram:

The example1() function opens (or creates) a repository by building a series of client-side objects, culminating in a "connection" object. The connection object will be passed to other functions in tutorial_examples.py.

The connection object contains the methods that let us manipulate triples in a specific repository.

The example first connects to an AllegroGraph Server by providing the endpoint (host IP address and port number) of an already-launched AllegroGraph server. This creates a client-side server object, which can access the AllegroGraph server's list of available catalogs through the listCatalogs() method:.

def example1(accessMode=Repository.RENEW):
    print "Defining connnection to AllegroGraph server -- host:'%s' port:%s" % (AG_HOST, AG_PORT)
    server = AllegroGraphServer(AG_HOST, AG_PORT, 'user', 'password')
    print "Available catalogs", server.listCatalogs()

This is the output so far:

>>> example1()
Defining connnection to AllegroGraph server -- host:'localhost' port:10035
Available catalogs ['/', 'scratch']

This output says that the server has two catalogs, the default rootCatalog, '/', and a named catalog 'scratch' that someone has created for some experimentation.

In the next line of example1(), we use the server.openCatalog() method to create a client-side catalog object. In this example we will connect to the default rootCatalog by calling openCatalog() with no arguments. When we look inside the root catalog, we can see which repositories are available:

    catalog = server.openCatalog() 
    print "Available repositories in catalog '%s': %s" % (catalog.getName(), catalog.listRepositories()) 

The corresponding output lists the available repositories. (When you run the examples, you may see a different list of repositories.)

Available repositories in catalog 'None': ['tutorial']  

The next step is to create a client-side repository object representing the respository we wish to open, by calling the getRepository() method of the catalog object. We have to provide the name of the desired repository (AG_REPOSITORY, bound to python-tutorial in this case), and select one of four access modes:

Repository.RENEW is the default setting for the example1() function of tutorial_examples.py. It can be overridden by calling example1() with the appropriate argument, such as example1(Repository.OPEN).

    myRepository = catalog.getRepository(AG_REPOSITORY, accessMode)
    myRepository.initialize()

A new or renewed repository must be initialized, using the initialize() method of the repository object. If you try to initialize a respository twice you get a warning message in the Python window but no exception.

The goal of all this object-building has been to create a client-side connection object, whose methods let us manipulate the triples of the repository. The repository object's getConnection() method returns this connection object.

    connection = myRepository.getConnection()
    print "Repository %s is up! It contains %i statements." % (
    myRepository.getDatabaseName(), connection.size())

The size() method of the connection object returns how many triples are present. In the example1() function, this number should always be zero because we "renewed" the repository. This is the output in the Python window:

   Repository pythontutorial is up! It contains 0 statements.
   <franz.openrdf.repository.repositoryconnection.RepositoryConnection object at 0x0127D710>
 >>> 

Whenever you create a new repository, you should stop to consider which kinds of triple indices you will need.  This is an important efficiency decision.  AllegroGraph uses a set of sorted indices to quickly identify a contiguous block of triples that are likely to match a specific query pattern.

These indices are identified by names that describe their organization. The default set of indices are called spogi, posgi, ospgi, gspoi, gposi, gospi, and i , where:

The order of the letters denotes how the index has been organized. For instance, the spogi index contains all of the triples in the store, sorted first by subject, then by predicate, then by object, and finally by graph. The triple id number is present as a fifth column in the index. If you know the URI of a desired resource (the subject value of the query pattern), then the spogi index lets you retrieve all triples with that subject as a single block. 

The idea is to provide your respository with the indices that your queries will need, and to avoid maintaining indices that you will never need. 

We can use the connection object's listValidIndices() method to examine the list of all possible AllegroGraph triple indices:

    indices = conn.listValidIndices()
print "All valid triple indices: %s" % (indices)

This is the list of all possible valid indices:

All valid triple indices: ['spogi', 'spgoi', 'sopgi', 'sogpi', 'sgpoi', 'sgopi', 
'psogi', 'psgoi', 'posgi', 'pogsi', 'pgsoi', 'pgosi', 'ospgi', 'osgpi', 'opsgi', 
'opgsi', 'ogspi', 'ogpsi', 'gspoi', 'gsopi', 'gpsoi', 'gposi', 'gospi', 'gopsi', 'i']

AllegroGraph can generate any of these indices if you need them, but it creates only seven indices by default.  We can see the current indices by using the connection object's listIndices() method:

    indices = conn.listIndices()
print "Current triple indices: %s" % (indices)

There are currently seven indices:

Current triple indices: ['i', 'gospi', 'gposi', 'gspoi', 'ospgi', 'posgi', 'spogi']

The indices that begin with "g" are sorted primarily by subgraph (or "context").  If you application does not use subgraphs, you should consider removing these indices from the repository.  You don't want to build and maintain triple indices that your application will never use.  This wastes CPU time and disk space.  The connection object has a convenient dropIndex() method:

    print "Removing graph indices..."
conn.dropIndex("gospi")
conn.dropIndex("gposi")
conn.dropIndex("gspoi")
indices = conn.listIndices()
print "Current triple indices: %s" % (indices)

Having dropped three of the triple indices, there are now four remaining:

Removing graph indices...
Current triple indices: ['i', 'ospgi', 'posgi', 'spogi']

The i index is for deleting triples by using the triple id number.  The ospgi index is sorted primarily by object value, which makes it possible to grab a range of object values as a single block of triples from the index.  Similarly, the posgi index lets us reach for a block of triples that all share the same predicate.  We mentioned previously that the spogi index lets us retrieve blocks of triples that all have the same subject URI. 

As it happens, we may have been overly hasty in eliminating all of the graph indices.  AllegroGraph can find the right matches as long as there is any one index present, but using the "right" index is much faster.  Let's put one of the graph indices back, just in case we need it. We'll use the connection object's addIndex() method:

    print "Adding one graph index back in..."
conn.addIndex("gspoi")
indices = conn.listIndices()
print "Current triple indices: %s" % (indices) return conn
Adding one graph index back in...
Current triple indices: ['i', 'gspoi', 'ospgi', 'posgi', 'spogi']

The last line (return conn) is the pointer to the new connection object. This is the value returned by example1() when it is called by other functions in tutorial_examples.py. The other functions then use the connection object to access the repository.

Asserting and Retracting Triples (example2)    Return to Top

In example example2(), we show how to create resources describing two people, Bob and Alice, by asserting individual triples into the respository. The example also retracts and replaces a triple. Assertions and retractions to the triple store are executed by 'add' and 'remove' methods belonging to the connection object, which we obtain by calling the example1() function (described above).

Before asserting a triple, we have to generate the URI values for the subject, predicate and object fields. The Python Sesame API to AllegroGraph Server predefines a number of classes and predicates for the RDF, RDFS, XSD, and OWL ontologies. RDF.TYPE is one of the predefined predicates we will use.

The 'add' and 'remove' methods take an optional 'contexts' argument that specifies one or more subgraphs that are the target of triple assertions and retractions. When the context is omitted, triples are asserted/retracted to/from the background graph. In the example below, facts about Alice and Bob reside in the background graph.

The example2() function begins by calling example1() to create the appropriate connection object, which is bound to the variable conn.

def example2():
    conn = example1()

The next step is to begin assembling the URIs we will need for the new triples. The createURI() method generates a URI from a string. These are the subject URIs identifying the resources "Bob" and "Alice":

    alice = conn.createURI("http://example.org/people/alice")
    bob = conn.createURI("http://example.org/people/bob")

Bob and Alice will be members of the "person" class (RDF:TYPE person).

    person = conn.createURI("http://example.org/ontology/Person")

Both Bob and Alice will have a "name" attribute.

    name = conn.createURI("http://example.org/ontology/name")

The name attributes will contain literal values. We have to generate the Literal objects from strings:

    bobsName = conn.createLiteral("Bob")
    alicesName = conn.createLiteral("Alice")

The next line prints out the number of triples currently in the repository.

    print "Triple count before inserts: ", conn.size()
Triple count before inserts:  0

Now we assert four triples, two for Bob and two more for Alice, using the connection object's add() method. After the assertions, we count triples again (there should be four) and print out the triples for inspection.

    ## alice is a person
    conn.add(alice, RDF.TYPE, person)
    ## alice's name is "Alice"
    conn.add(alice, name, alicesName)
    ## bob is a person
    conn.add(bob, RDF.TYPE, person)
    ## bob's name is "Bob":
    conn.add(bob, name, bobsName)

    print "Triple count: ", conn.size()
    for s in conn.getStatements(None, None, None, None, False): print s 

The "None" arguments to the getStatements() method say that we don't want to restrict what values may be present in the subject, predicate, object or context positions. Just print out all the triples.

This is the output at this point. We see four triples, two about Alice and two about Bob:

Triple count:  4
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>)
(<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice")
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>)
(<http://example.org/people/bob>, <http://example.org/ontology/name>, "Bob")

We see two resources of type "person," each with a literal name.

The next step is to demonstrate how to remove a triple. Use the remove() method of the connection object, and supply a triple pattern that matches the target triple. In this case we want to remove Bob's name triple from the repository. Then we'll count the triples again to verify that there are only three remaining. Finally, we re-assert Bob's name so we can use it in subsequent examples, and we'll return the connection object..

    conn.remove(bob, name, bobsName)
    print "Triple count: ", conn.size()
    conn.add(bob, name, bobsName)    
    return conn
Triple count:  3
<franz.openrdf.repository.repositoryconnection.RepositoryConnection object at 0x01466830>

A SPARQL Query (example3)    Return to Top

SPARQL stands for the "SPARQL Protocol and RDF Query Language," a recommendation of the World Wide Web Consortium (W3C). SPARQL is a query language for retrieving RDF triples.

Our next example illustrates how to evaluate a SPARQL query. This is the simplest query, the one that returns all triples. Note that example3() continues with the four triples created in example2().

def example3():    
    conn = example2()
    try:
        queryString = "SELECT ?s ?p ?o  WHERE {?s ?p ?o .}"

The SELECT clause returns the variables ?s, ?p and ?o in the bindingSet. The variables are bound to the subject, predicate and objects values of each triple that satisfies the WHERE clause. In this case the WHERE clause is unconstrained. The dot (.) in the fourth position signifies the end of the pattern.

The connection object's prepareTupleQuery() method creates a query object that can be evaluated one or more times. (A "tuple" is an ordered sequence of data elements in Python.) The results are returned in an iterator that yields a sequence of bindingSets.

        tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
        result = tupleQuery.evaluate();

Below we illustrate one (rather heavyweight) method for extracting the values from a binding set, indexed by the name of the corresponding column variable in the SELECT clause.

        try:
            for bindingSet in result:
                s = bindingSet.getValue("s")
                p = bindingSet.getValue("p")
                o = bindingSet.getValue("o")              
                print "%s %s %s" % (s, p, o)
http://example.org/people/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/ontology/Person
http://example.org/people/alice http://example.org/ontology/name "Alice"
http://example.org/people/bob http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/ontology/Person
http://example.org/people/bob http://example.org/ontology/name "Bob"

The Connection class is designed to be created for the duration of a sequence of updates and queries, and then closed. In practice, many AllegroGraph applications keep a connection open indefinitely. However, best practice dictates that the connection should be closed, as illustrated below. The same hygiene applies to the iterators that generate binding sets.

        finally:
result.close();
finally:
conn.close();
myRepository = conn.repository
myRepository.shutDown()

Statement Matching (example4)    Return to Top

The getStatements() method of the connection object provides a simple way to perform unsophisticated queries. This method lets you enter a mix of required values and wildcards, and retrieve all matching triples. (If you need to perform sophisticated tests and comparisons you should use the SPARQL query instead.)

Below, we illustrate two kinds of 'getStatement' calls. The first mimics traditional Sesame syntax, and returns a Statement object at each iteration. This is the example4() function of tutorial_examples.py. It begins by calling example2() to create a connection object and populate the pythontutorial repository with four triples describing Bob and Alice. We're going to search for triples that mention Alice, so we have to create an "Alice" URI to use in the search pattern:

def example4():
conn = example2()
alice = conn.createURI("http://example.org/people/alice")

Now we search for triples with Alice's URI in the subject position. The "None" values are wildcards for the predicate and object positions of the triple.

    print "Searching for Alice using getStatements():"
    statements = conn.getStatements(alice, None, None)

The getStatements() method returns a repositoryResult object (bound to the variable "statements" in this case). This object can be iterated over, exposing one result statement at a time. It is sometimes desirable to screen the results for duplicates, using the enableDuplicateFilter() method. Note, however, that duplicate filtering can be expensive. Our example does not contain any duplicates, but it is possible for them to occur.

    statements.enableDuplicateFilter() 
for s in statements:
print s

This prints out the two matching triples for "Alice."

Searching for Alice using getStatements():
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>) (<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice")

At this point it is good form to close the respositoryResponse object because they occupy memory and are rarely reused in most programs.

	statements.close()

Literal Values (example5)   Return to Top

The next example, example5(), illustrates some variations on what we have seen so far. The example creates and asserts typed and plain literal values, including language-specific plain literals, and then conducts searches for them in three ways:

The getStatements() and SPARQL direct searches return exactly the datatype you ask for. The SPARQL filter queries can sometimes return multiple datatypes. This behavior will be one focus of this section.

If you are not explicit about the datatype of a value, either when asserting the triple or when writing a search pattern, AllegroGraph will deduce an appropriate datatype and use it. This is another focus of this section. This helpful behavior can sometimes surprise you with unanticipated results.

Setup

Example5() begins by obtaining a connection object from example1(), and then clears the repository of all existing triples.

def example5():
conn = example1()
conn.clear()

For sake of coding efficiency, it is good practice to create variables for namespace strings. We'll use this namespace again and again in the following lines. We have made the URIs in this example very short to keep the result displays compact.

    exns = "http://people/"

The example creates new resources describing seven people, named alphabetically from Alice to Greg. These are URIs to use in the subject field of the triples. The example shows how to enter a full URI string, or alternately how to combine a namespace with a local resource name.

    alice = conn.createURI("http://example.org/people/alice")
bob = conn.createURI("http://example.org/people/bob")
carol = conn.createURI("http://example.org/people/carol")
dave = conn.createURI(namespace=exns, localname="dave")
eric = conn.createURI(namespace=exns, localname="eric")
fred = conn.createURI(namespace=exns, localname="fred") greg = conn.createURI(namesapce=exns, localname="greg")

Numeric Literal Values

This section explores the behavior of numeric literals.

Asserting Numeric Data

The first section assigns ages to the participants, using a variety of numeric types. First we need a URI for the "age" predicate.

    age = conn.createURI(namespace=exns, localname="age")

The next step is to create a variety of values representing ages. Coincidentally, these people are all 42 years old, but we're going to record that information in multiple ways:

    fortyTwo = conn.createLiteral(42)             # creates long
    fortyTwoDouble = conn.createLiteral(42.0)    # creates double
    fortyTwoInt = conn.createLiteral('42', datatype=XMLSchema.INT)
    fortyTwoLong = conn.createLiteral('42', datatype=XMLSchema.LONG) 
    fortyTwoFloat = conn.createLiteral('42', datatype=XMLSchema.FLOAT) 
    fortyTwoString = conn.createLiteral('42', datatype=XMLSchema.STRING)
    fortyTwoPlain = conn.createLiteral('42')      # creates plain literal

In four of these statements, we explicitly identified the datatype of the value in order to create an INT, a LONG, a FLOAT and a STRING. This is the best practice.

In three other statements, we just handed AllegroGraph numeric-looking values to see what it would do with them. As we will see in a moment, 42 creates a LONG, 42.0 becomes into a DOUBLE, and '42' becomes a "plain" (untyped) literal value. (Note that plain literals are not quite the same thing as typed literal strings. A search for a plain literal will not always match a typed string, and vice versa.)

Now we need to assemble the URIs and values into statements (which are client-side triples):

    stmt1 = conn.createStatement(alice, age, fortyTwo)
    stmt2 = conn.createStatement(bob, age, fortyTwoDouble)    
    stmt3 = conn.createStatement(carol, age, fortyTwoInt)    
    stmt4 = conn.createStatement(dave, age, fortyTwoLong)    
    stmt5 = conn.createStatement(eric, age, fortyTwoFloat)
    stmt6 = conn.createStatement(fred, age, fortyTwoString)  
    stmt7 = conn.createStatement(greg, age, fortyTwoPlain)

And then add the statements to the triple store on the AllegroGraph server. We can use either add() or addStatement() for this purpose.

    conn.add(stmt1)
    conn.add(stmt2)
    conn.add(stmt3)
    conn.addStatement(stmt4)
    conn.addStatement(stmt5)
    conn.addStatement(stmt6)
    conn.addStatement(stmt7)

Now we'll complete the round trip to see what triples we get back from these assertions. This is how we use getStatements() in this example to retrieve and display triples for us:

    print "\nShowing all age triples using getStatements(). Seven matches."
    statements = conn.getStatements(None, age, None)
    for s in statements:
        print s

This loop prints all age triples to the interaction window. Note that the retrieved triples are of six types: two ints, a long, a float, a double, a string, and a "plain literal." All of them say that their person's age is 42. Note that the triple for Greg has the plain literal value "42", while the triple for Fred uses "42" as a typed string.

Showing all age triples using getStatements(). Seven matches.
(<http://people/greg>, <http://people/age>, "42")
(<http://people/fred>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/eric>, <http://people/age>, "4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>)
(<http://people/dave>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)
(<http://people/carol>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#int>)
(<http://people/bob>, <http://people/age>, "4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>)
(<http://people/alice>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)

If you ask for a specific datatype, you will get it. If you leave the decision up to AllegroGraph, you might get something unexpected such as an plain literal value.

Matching Numeric Data

This section explores getStatements() and SPARQL matches against numeric triples.

In the first example, we asked AllegroGraph to find an untyped number, 42.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, 42) "42"^^<http://www.w3.org/2001/XMLSchema#long>
SPARQL direct match SELECT ?s ?p WHERE {?s ?p 42 .} No direct matches.
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = 42)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

The getStatements() query returned triples containing longs only. The SPARQL direct match didn't know how to interpret the untyped value and found zero matches. The SPARQL filter match, however, opened the doors to matches of multiple numeric types, and returned ints, floats, longs and doubles.

"Match 42.0" without explicitly declaring the number's type.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, 42.0)

"42"^^<http://www.w3.org/2001/XMLSchema#double>
Matches a double but not a float.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p 42.0 .} No direct matches.
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = 42.0)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

The getStatements() search returned a double but not the similar float. The filter match returned all numeric types that were equal to 42.0.

"Match '42'^^<http://www.w3.org/2001/XMLSchema#int>." Note that we have to use a variable (fortyTwoInt) bound to a Literal value in order to offer this int to getStatements(). We can't just type the value into the getStatements() method directly.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoInt)

"42"^^<http://www.w3.org/2001/XMLSchema#int>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#int>} "42"^^<http://www.w3.org/2001/XMLSchema#int>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#int>)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

Both the getStatements() query and the SPARQL direct query returned exactly what we asked for: ints. The filter match returned all numeric types that matches in value.

"Match '42'^^<http://www.w3.org/2001/XMLSchema#long>." Again we need a bound variable to offer a Literal value to getStatements().

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoLong)

"42"^^<http://www.w3.org/2001/XMLSchema#long>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#long>} "42"^^<http://www.w3.org/2001/XMLSchema#long>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#long>)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

Both the getStatements() query and the SPARQL direct query returned longs. The filter match returned all numeric types.

"Match '42'^^<http://www.w3.org/2001/XMLSchema#double>."

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoDouble)

"42"^^<http://www.w3.org/2001/XMLSchema#double>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#double>} "42"^^<http://www.w3.org/2001/XMLSchema#double>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#double>)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

Both the getStatements() query and the SPARQL direct query returned doubles. The filter match returned all numeric types.

Matching Numeric Strings and Plain Literals

At this point we are transitioning from tests of numeric matches to tests of string matches, but there is a gray zone to be explored first. What do we find if we search for strings that contain numbers? In particular, what about "plain literal" values that are almost, but not quite, strings?

"Match '42'^^<http://www.w3.org/2001/XMLSchema#string>." This example asks for a typed string to see if we get any numeric matches back.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoString)

"42"^^<http://www.w3.org/2001/XMLSchema#string>
It did not match the plain literal.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#string>}

"42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#string>)} "42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

The getStatements() query matched a literal string only. The SPARQL queries returned matches that were both typed strings and plain literals. There were no numeric matches.

"Match plain literal '42'." This example asks for a plain literal to see if we get any numeric matches back.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoPlain)

"42" This is the plain literal. It did not match the string.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"}

"42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42")} "42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

The getStatements() query matched the plain literal only, and did not match the string. The SPARQL queries returned matches that were both typed strings and plain literals. There were no numeric matches.

The interesting lesson here is that AllegroGraph distinguishes between strings and plain literals when you use getStatements(), but it lumps them together when you use SPARQL.

Matching Strings

In this section we'll set up a variety of string triples and experiment with matching them using getStatements() and SPARQL. Note that Free Text Search is a different topic. In this section we're doing simple matches of whole strings.

Asserting String Values

We're going to add a "favorite color" attribute to five of the person resources we have used so far. First we need a predicate.

    favoriteColor = conn.createURI(namespace=exns, localname="favoriteColor")

Now we'll create a variety of string values, and a single "plain literal" value.

    UCred = conn.createLiteral('Red', datatype=XMLSchema.STRING)
LCred = conn.createLiteral('red', datatype=XMLSchema.STRING)
RedPlain = conn.createLiteral('Red') #plain literal
rouge = conn.createLiteral('rouge', datatype=XMLSchema.STRING)
Rouge = conn.createLiteral('Rouge', datatype=XMLSchema.STRING)
RougePlain = conn.createLiteral('Rouge') #plain literal
FrRouge = conn.createLiteral('Rouge', language="fr") #plain literal with language tag

Note that in the last line we created a plain literal and assigned it a French language tag. You cannot assign a language tag to strings, only to plain literals. See typed and plain literal values for the specification.

Next we'll add these values to new triples in the triple store. For variety, we bypassed creating client-side statements and just asserted the triples in one step.

    conn.addTriples([(alice, favoriteColor, UCred),
(bob, favoriteColor, LCred),
(carol, favoriteColor, RedPlain),
(dave, favoriteColor, rouge),
(eric, favoriteColor, Rouge),
(fred, favoriteColor, RougePlain),
(greg, favoriteColor, FrRouge)])

If we run a getStatements() query for all favoriteColor triples, we get these values returned:

(<http://people/alice>, <http://people/favoriteColor>, "Red"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/bob>, <http://people/favoriteColor>, "red"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/carol>, <http://people/favoriteColor>, "Red")
(<http://people/dave>, <http://people/favoriteColor>, "rouge"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/eric>, <http://people/favoriteColor>, "Rouge"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/fred>, <http://people/favoriteColor>, "Rouge")
(<http://people/greg>, <http://people/favoriteColor>, "Rouge"@fr)

That's four typed strings, capitalized and lower case, plus three plain literals, one with a language tag.

Matching String Data

First let's search for "Red" without specifying a datatype.

"Match 'Red'." What happens if we search for "Red" without specifying a string datatype?

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, "Red")

"Red" This is the plain literal. It did not match the string.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "Red"}

"Red"^^<http://www.w3.org/2001/XMLSchema#string>
"Red" This is the plain literal value.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "Red")} "Red"^^<http://www.w3.org/2001/XMLSchema#string>
"Red" This is the plain literal value.

The getStatements() query matched the plain literal only, and did not match the similar string. The SPARQL queries matched both "Red" typed strings and "Red" plain literals, but they did not return the lower case "red" triple. The match was liberal regarding datatype but strict about case.

Let's try "Rouge".

"Match 'Rouge'." What happens if we search for "Rouge" without specifying a string datatype or language? Will it match the triple with the French tag?

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, "Rouge")

"Rouge" This is the plain literal. It did not match the string.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "Rouge"}

"Rouge"^^<http://www.w3.org/2001/XMLSchema#string>
"Rouge" This is the plain literal value.
Did not match the "Rouge"@fr triple.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "Rouge")} "Rouge"^^<http://www.w3.org/2001/XMLSchema#string>
"Rouge" This is the plain literal value.
Did not match the"Rouge"@fr triple.

The getStatements() query matched the plain literal only, and did not match the similar string. The SPARQL queries matched both "Rouge" typed strings and "Rouge" plain literals, but they did not return the "Rouge"@fr triple. The match was liberal regarding datatype but strict about language. We didn't ask for French, so we didn't get French.

"Match 'Rouge'@fr." What happens if we search for "Rouge"@fr? We'll have to bind the value to a variable (FrRouge) to use getStatements(). We can type the value directly into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, FrRouge)

"Rouge"@fr

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "Rouge"@fr}

"Rouge"@fr

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "Rouge"@fr)} "Rouge"@fr

If you ask for a specific language, that's exactly what you are going to get, in all three types of queries.

You may be wondering how to perform a string match where language and capitalization don't matter. You can do that with a SPARQL filter query using the str() function, which strips out the string portion of a literal, leaving behind the datatype or language tag. Then the lowercase() function eliminates case issues:

PREFIX fn: <http://www.w3.org/2005/xpath-functions#> 
SELECT ?s ?p ?o 
WHERE {?s ?p ?o . filter (fn:lower-case(str(?o)) = "rouge")}

This query returns a variety of "Rouge" triples:

<http://people/greg> <http://people/favoriteColor> "Rouge"@fr
<http://people/fred> <http://people/favoriteColor> "Rouge"
<http://people/eric> <http://people/favoriteColor> "Rouge"^^<http://www.w3.org/2001/XMLSchema#string>
<http://people/dave> <http://people/favoriteColor> "rouge"^^<http://www.w3.org/2001/XMLSchema#string>

This query matched all triples containing the string "rouge" regardless of datatype or language tag. Remember that the SPARQL "filter" queries are powerful, but they are also the slowest queries. SPARQL direct queries and getStatements() queries are faster.

Matching Booleans

In this section we'll assert and then search for Boolean values.

Asserting Boolean Values

The Python language includes predefined True and False symbols for use in your decision-making logic. These Boolean constants are fine for programming, but they are not appropriate for asserting Boolean values into triples. They do not result in legal RDF Boolean values, and this creates subsequent matching issues.

We'll be adding a new attribute to the person resources in our example. Are they, or are they not, seniors?

    senior = conn.createURI(namespace=exns, localname="seniorp")    

The correct way to create Boolean values for use in triples is to create literal values of type Boolean:

    trueValue = conn.createLiteral("true", datatype=XMLSchema.BOOLEAN)
falseValue = conn.createLiteral("false", datatype=XMLSchema.BOOLEAN)

Note that "true" and "false" must be lower case.

We'll only need two triples:

    conn.addTriple(alice, senior, trueValue)
conn.addTriple(bob, senior, falseValue)

When we retrieve the triples (using getStatements()) we see:

(<http://people/bob>, <http://people/seniorp>, "false"^^<http://www.w3.org/2001/XMLSchema#boolean>)
(<http://people/alice>, <http://people/seniorp>, "true"^^<http://www.w3.org/2001/XMLSchema#boolean>)

These are RDF-legal Boolean values that work with the AllegroGraph query engine. Note that "true" and "false" are lower case.

As an object lesson, this getStatements() query uses the Python "True" Boolean, which does not match the values in the RDF store:

conn.getStatements(None, senior, True)     # no matches

"Match 'true'." There are three correct ways to perform a Boolean search. One is to use the varible trueValue (defined above) to pass a Boolean literal value to getStatements(). SPARQL queries will recognize true and false, and of course the fully-typed "true"^^<http://www.w3.org/2001/XMLSchema#boolean> format is also respected by SPARQL:

Query Type Query Matches which types?
getStatements() conn.getStatements(None, senior, trueValue)

"true"^^<http://www.w3.org/2001/XMLSchema#boolean>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p true}

"true"^^<http://www.w3.org/2001/XMLSchema#boolean>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "true"^^<http://www.w3.org/2001/XMLSchema#boolean> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = true)} "true"^^<http://www.w3.org/2001/XMLSchema#boolean>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "true"^^<http://www.w3.org/2001/XMLSchema#boolean>} "true"^^<http://www.w3.org/2001/XMLSchema#boolean>

All of these queries correctly match Boolean values.

Dates, Times and Datetimes

In this final section of example5(), we'll assert and retrieve dates, times and datetimes.

In this context, you might be surprised by the way that AllegroGraph handles time zone data. If you assert (or search for) a timestamp that includes a time-zone offset, AllegroGraph will "normalize" the expression to Greenwich (zulu) time before proceeding. This normalization greatly speeds up searching and happens transparently to you, but you'll notice that the matched values are all zulu times.

Asserting Date, Time and Datetime Values

We're going to add birthdates to our personnel records. We'll need a birthdate predicate:

    birthdate = conn.createURI(namespace=exns, localname="birthdate")  

We'll also need four types of literal values: a date, a time, a datetime, and a datetime with a time-zone offset.

    date = conn.createLiteral('1984-12-06', datatype=XMLSchema.DATE)     
datetime = conn.createLiteral('1984-12-06T09:00:00', datatype=XMLSchema.DATETIME)
time = conn.createLiteral('09:00:00Z', datatype=XMLSchema.TIME)
datetimeOffset = conn.createLiteral('1984-12-06T09:00:00+01:00', datatype=XMLSchema.DATETIME)

It is interesting to notice that these literal values print out exactly as we defined them.

Printing out Literals for date, datetime, time, and datetime with Zulu offset.
"1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>
"1984-12-06T09:00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
"09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>
"1984-12-06T09:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>

Now we'll add them to the triple store:

    conn.addTriples([(alice, birthdate, date),
(bob, birthdate, datetime),
(carol, birthdate, time),
(dave, birthdate, datetimeOffset)])

And then retrieve them using getStatements():

Showing all birthday triples using getStatements(). Should be four.
(<http://people/alice>, <http://people/birthdate>, "1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>)
(<http://people/bob>, <http://people/birthdate>, "1984-12-06T09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(<http://people/carol>, <http://people/birthdate>, "09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>)
(<http://people/dave>, <http://people/birthdate>, "1984-12-06T08:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)

If you look sharply, you'll notice that the zulu offset has been normalized:

Was:"1984-12-06T09:00:00+01:00"
Now:"1984-12-06T08:00:00Z"

Note that the one-hour zulu offset has been applied to the timestamp. "9:00" turned into "8:00."

Matching Date, Time, and Datetime Literals

"Match date." What happens if we search for the date literal we defined? We'll use the "date" variable with getStatements(), but just type the expected value into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, date)

"1984-12-06"
^^<http://www.w3.org/2001/XMLSchema#date>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p '1984-12-06'^^<http://www.w3.org/2001/XMLSchema#date>

"1984-12-06"
^^<http://www.w3.org/2001/XMLSchema#date>

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o =
'1984-12-06'
^^<http://www.w3.org/2001/XMLSchema#date>)}
"1984-12-06"
^^<http://www.w3.org/2001/XMLSchema#date>

All three queries match narrowly, meaning the exact date and datatype we asked for is returned.

"Match datetime." What happens if we search for the datetime literal? We'll use the "datetime" variable with getStatements(), but just type the expected value into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, datetime)

"1984-12-06T09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p '1984-12-06T09:00:00Z'
^^<http://www.w3.org/2001/XMLSchema#dateTime> .}

"1984-12-06T09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = '1984-12-06T09:00:00Z'^^<http://www.w3.org/2001/XMLSchema#dateTime>)} "1984-12-06T09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

The matches are specific for the exact date, time and type.

"Match time." What happens if we search for the time literal? We'll use the "time" variable with getStatements(), but just type the expected value into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, time)

"09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time> .}

"09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time>

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>)} "09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time>

The matches are specific for the exact time and type.

"Match datetime with offset." What happens if we search for a datetime with zulu offset?

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, datetimeOffset)

"1984-12-06T08:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "1984-12-06T09:00:00+01:00"
^^<http://www.w3.org/2001/XMLSchema#dateTime> .}

"1984-12-06T08:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL filter match

SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "1984-12-06T09:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)}

"1984-12-06T08:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

Note that we searched for "1984-12-06T09:00:00+01:00" but found "1984-12-06T08:00:00Z". It is the same moment in time.

Importing Triples (example6 and example7)    Return to Top

The Python Sesame API client can load triples in either RDF/XML format or NTriples format. The example below calls the connection object's add() method to load an NTriples file, and addFile() to load an RDF/XML file. Both methods work, but the best practice is to use addFile().

Note: If you get a "file not found" error while running this example, it means that Python is looking in the wrong directory for the data files to load. The usual explanation is that you have moved the tutorial_examples.py file to an unexpected directory. You can clear the issue by putting the data files in the same directory as tutorial_examples.py, or by setting the Python current working directory to the location of the data files using os.setcwd().

The RDF/XML file contains a short list of v-cards (virtual business cards), like this one:

  <rdf:Description rdf:about="http://somewhere/JohnSmith/">
<vCard:FN>John Smith</vCard:FN>
<vCard:N rdf:parseType="Resource">
<vCard:Family>Smith</vCard:Family>
<vCard:Given>John</vCard:Given>
</vCard:N>
</rdf:Description>

The NTriples file contains a graph of resources describing the Kennedy family, the places where they were each born, their colleges, and their professions. A typical entry from that file looks like this:

<http://www.franz.com/simple#person1> <http://www.franz.com/simple#first-name> "Joseph" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#middle-initial> "Patrick" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#last-name> "Kennedy" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#suffix> "none" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#alma-mater> <http://www.franz.com/simple#Harvard> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#birth-year> "1888" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#death-year> "1969" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#sex> <http://www.franz.com/simple#male> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#spouse> <http://www.franz.com/simple#person2> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#has-child> <http://www.franz.com/simple#person3> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#profession> <http://www.franz.com/simple#banker> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#birth-place> <http://www.franz.com/simple#place5> . 
<http://www.franz.com/simple#person1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.franz.com/simple#person> . 

Note that AllegroGraph can segregate triples into contexts (subgraphs) by treating them as quads, but the NTriples and RDF/XML formats can not include context information. They deal with triples only, so there is no place to store a fourth field in those formats. In the case of the add() call, we have omitted the context argument so the triples are loaded the default background graph (sometimes called the "null context.") The addFile() call includes an explicit context setting, so the fourth argument of each vcard triple will be the context named "/tutorial/vc_db_1_rdf". The connection size() method takes an optional context argument. With no argument, it returns the total number of triples in the repository. Below, it returns the number '16' for the 'context' context argument, and the number '28' for the null context (None) argument.

The example6() function of tutorial_examples.py creates a dedicated session connection to AllegroGraph, using methods you have seen before, plus the Connection object's openSession() method:

def example6(close=True):
print "Starting example6()."
server = AllegroGraphServer(AG_HOST, AG_PORT, AG_USER, AG_PASSWORD)
catalog = server.openCatalog(AG_CATALOG)
myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW)
myRepository.initialize()
conn = myRepository.getConnection()
conn.clear()
conn.openSession()

The dedicated session is not immediately pertinent to the examples in this section, but will become important in later examples that reuse this connection to demonstrate Prolog Rules and Social Network Analysis.

The variables path1 and path2 are bound to the RDF/XML and NTriples files, respectively.

    path1 = "./vcards.rdf"    
path2 = "./kennedy.ntriples"

The NTriples about the vcards will be added to a specific context, so naturally we need a URI to identify that context.

    context = conn.createURI("http://example.org#vcards")

In the next step we use addFile() to load the vcard triples into the #vcards context:

    conn.addFile(path1, None, format=RDFFormat.RDFXML, context=context)

Then we use add() to load the Kennedy family tree into the null context:

    conn.add(path2, base=None, format=RDFFormat.NTRIPLES, contexts=None)

Now we'll ask AllegroGraph to report on how many triples it sees in the null context and in the #vcards context:

    print "After loading, repository contains %i vcard triples in context '%s'\n    
           and   %i kennedy triples in context '%s'." % 
(conn.size(context), context, conn.size('null'), 'null')
return conn

The output of this report was:

After loading, repository contains 16 vcard triples in context 'http://example.org#vcards'
and 1214 kennedy triples in context 'null'.

The SPARQL query below is found in example7() of tutorial_examples.py. It borrows the same triples we loaded in example6(), above, and runs two unconstrained retrievals. The first uses getStatement, and prints out the subject URI and context of each triple.

def example7():    
conn = example6()
print "Match all and print subjects and contexts"
result = conn.getStatements(None, None, None, None, limit=25)
for row in result: print row.getSubject(), row.getContext()

This loop prints out a mix of triples from the null context and from the #vcards context. We set a limit of 25 triples because the Kennedy dataset contains over a thousand triples.

The following loop, however, does not produce the same results. This is a SPARQL query that should match all available triples, printing out the subject and context of each triple. We limited this query by using the DISTINCT keyword. Otherwise there would be many duplicate results.

    print "\nSame thing with SPARQL query (can't retrieve triples in the null context)"
queryString = "SELECT DISTINCT ?s ?c WHERE {graph ?c {?s ?p ?o .} }"
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
for i, bindingSet in enumerate(result):
print bindingSet[0], bindingSet[1]
conn.close()

In this case, the loop prints out only v-card results. The SPARQL query is not able to access the null context when a non-null context is also present.

Exporting Triples (example8 and example9)    Return to Top

The next examples show how to write triples out to a file in either NTriples format or RDF/XML format. The output of either format may be optionally redirected to standard output (the Python command window) for inspection.

Example example8() begins by obtaining a connection object from example6(). This means the repository contains v-card triples in the #vcards context, and Kennedy family tree triples in the null context.

def example8():
conn = example6()

In this example, we'll export the triples in the #vcards context.

    context = conn.createURI("http://example.org#vcards")

To write triples in NTriples format, call NTriplesWriter(). You have to tell it the path and file name of the exported file. If the output file argument is 'None', the writers write to standard output. You can uncomment that line if you'd like to see it work. This code exports the vcards triples in ntriples format:

    outputFile = "/tmp/temp.nt"
#outputFile = None
if outputFile == None:
print "Writing RDF to Standard Out instead of to a file"
ntriplesWriter = NTriplesWriter(outputFile)
conn.export(ntriplesWriter, context);

To write triples in RDF/XML format, call RDFXMLWriter(). This code exports the Kennedy triples in RDF/XML format.

    outputFile2 = "/tmp/temp.rdf"
#outputFile2 = None
if outputFile2 == None:
print "Writing NTriples to Standard Out instead of to a file"
rdfxmlfWriter = RDFXMLWriter(outputFile2)
conn.export(rdfxmlfWriter, 'null')

The export() method writes out all triples in one or more contexts. This provides a convenient means for making local backups of sections of your RDF store. The 'null' argument targets the triples of the default graph. If two or more contexts are specified, then triples from all of those contexts will be written to the same file. Since the triples are "mixed together" in the file, the context information is not recoverable. If the context argument is omitted, all triples in the store are written out, and again all context information is lost.

Finally, if the objective is to write out a filtered set of triples, the exportStatements() method can be used. The example below (from example9()) writes out all familyName triples from the vcards context to standard output.

    familyName = conn.createURI("http://www.w3.org/2001/vcard-rdf/3.0#FN")
    conn.exportStatements(None, familyName, None, False, RDFXMLWriter(None), context)

Searching Multiple Graphs (example10)    Return to Top

We have already seen contexts (subgraphs) at work when loading and saving files. In example10() we provide more realistic examples of contexts, and we explore the FROM, FROM DEFAULT, and FROM NAMED clauses of a SPARQL query to see how they interact with multiple subgraphs in the triple store. Finally, we will introduce the dataset object. A dataset is a list of contexts that should all be searched simultaneously. It is an object for use with SPARQL queries.

To set up the example, we create six statements, and add two of each to three different contexts: context1, context2, and the null context. The process of setting up the six statements follows the same pattern as we used in the previous examples:

    ## Create URIs for resources, predicates and classes.
exns = "http://example.org/people/"
alice = conn.createURI(namespace=exns, localname="alice")
bob = conn.createURI(namespace=exns, localname="bob")
ted = conn.createURI(namespace=exns, localname="ted")
person = conn.createURI(namespace=exns, localname="Person")
name = conn.createURI(namespace=exns, localname="name")
## Create literal name values.
alicesName = conn.createLiteral("Alice")
bobsName = conn.createLiteral("Bob")
tedsName = conn.createLiteral("Ted")
## Create URIs to identify the named contexts.
context1 = conn.createURI(namespace=exns, localname="context1")
context2 = conn.createURI(namespace=exns, localname="context2")

The next step is to assert two triples into each of three contexts:

    ## Assemble new statements and add them to the contexts.        
conn.add(alice, RDF.TYPE, person, context1)
conn.add(alice, name, alicesName, context1)
conn.add(bob, RDF.TYPE, person, context2)
conn.add(bob, name, bobsName, context2)
conn.add(ted, RDF.TYPE, person) // Added to null context
conn.add(ted, name, tedsName) // Added to null context

Note that the final two statements (about Ted) were added to the null context (the unnamed default graph).

GetStatements

The first test uses getStatements() to return all triples in all contexts (context1, context2, and null). This is default search behavior, so there is no need to specify the contexts in the conn.getStatements() method. Note that conn.size() also reports on all contexts by default.

    statements = conn.getStatements(None, None, None)
print "All triples in all contexts: %s" % (conn.size())
for s in statements:
print s

The output of this loop is shown below. The context URIs are in the fourth position. Triples from the null context have no context value.

All triples in all contexts: 6
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context1>)
(<http://example.org/people/alice>, <http://example.org/people/name>, "Alice", <http://example.org/people/context1>)
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context2>)
(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/context2>)
(<http://example.org/people/ted>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)
(<http://example.org/people/ted>, <http://example.org/people/name>, "Ted")

The next match explicitly lists 'context1' and 'context2' as the only contexts to participate in the match. It returns four statements. The conn.size() method can also address individual contexts.

    statements = conn.getStatements(None, None, None, [context1, context2])
print "Triples in contexts 1 or 2: %s" % (conn.size([context1, context2]))
for s in statements:
print s

The output of this loop shows that the triples in the null context have been excluded.

Triples in contexts 1 or 2: 4
(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/context2>)
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context2>)
(<http://example.org/people/alice>, <http://example.org/people/name>, "Alice", <http://example.org/people/context1>)
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context1>)

This time we use getStatements() to search explicitly for triples in the null context and in context 2.

    statements = conn.getStatements(None, None, None, ['null', context2])
print "Triples in contexts null or 2: %s" % (conn.size(['null', context2]))
for s in statements:
print s

The output of this loop is:

Triples in contexts null or 2: 4
(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/context2>)
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/context2>)
(<http://example.org/people/ted>, <http://example.org/people/name>, "Ted")
(<http://example.org/people/ted>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)

The lesson is that getStatements() can freely mix triples from the null context and named contexts. It is all you need as long as the query is a very simple one.

SPARQL using FROM, FROM DEFAULT, and FROM NAMED

In many of our examples we have used a simple SPARQL query to retrieve triples from AllegroGraph's default graph. This has been very convenient but it is also misleading. As soon as we tell SPARQL to search a specific graph, we lose the ability to search AllegroGraph's default graph! Triples from the null graph vanish from the search results. Why is that?

Standard SPARQL was designed for named graphs only, and has no syntax to indentify a truly unnamed graph. AllegroGraph's SPARQL, however, has been extended to allow the unnamed graph to participate in multi-graph queries.

We can use AllegroGraph's SPARQL to search specific subgraphs in three ways. We can create a temporary "default graph" using the FROM operator; we can put AllegroGraph's unnamed graph into SPARQL's default graph using FROM DEFAULT; or we can target specific named graphs using the FROM NAMED operator.

We can also combine these operators in a single query, to search the SPARQL default graph and one or more named graphs at the same time.

The first example is a SPARQL query that used FROM DEFAULT to place AllegroGraph's unnamed graph into SPARQL's default graph. 

SELECT ?s ?p ?o FROM DEFAULT 
WHERE {?s ?p ?o . }

This query finds triples from the unnamed graph only (which are triples about Ted).  Note the simple query pattern.

['<http://example.org/people/ted>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/ted>', '<http://example.org/people/name>', '"Ted"']

Here's an example of a query that uses FROM. It instructs SPARQL to regard context1 as the default graph for the purposes of this query.

SELECT ?s ?p ?o 
FROM <http://example.org/people/context1> 
WHERE {?s ?p ?o . }

SPARQL uses the simple pattern {?s ?p ?o . } to match triples in context1, which is the temporary default graph:

['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"']

Notice that these query results do not have the fourth value we have come to expect.  That was stripped off when context1 became the (temporary) default context. 

The next example changes FROM to FROM NAMED in the same query:

SELECT ?s ?p ?o 
FROM NAMED <http://example.org/people/context1> 
WHERE {?s ?p ?o . }

This time there are no matches! The pattern {?s ?p ?o . } only matches the SPARQL default graph. We declared context1 to be a "named" graph, so it is no longer the default graph. To match triples in named graphs, SPARQL requires a GRAPH pattern:

SELECT ?s ?p ?o ?g 
FROM NAMED <http://example.org/people/context1> 
WHERE {GRAPH ?g {?s ?p ?o . }}";

When we combine GRAPH with FROM NAMED, we get the expected matches:

['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/context1>']
['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"', '<http://example.org/people/context1>']

What about a combination query? The graph commands can be mixed in a single query.

SELECT ?s ?p ?o ?g 
FROM DEFAULT
FROM <http://example.org/people/context1>
FROM NAMED <http://example.org/people/context2>
WHERE {{GRAPH ?g {?s ?p ?o . }} UNION {?s ?p ?o .}}

This query puts AllegroGraph's unnamed graph and the context1 graph into SPARQL's default graph, where the triples can be found by using a simple {?s ?p ?o . } query.  Then it identifies context2 as a named graph, which can be searched using a GRAPH pattern.  In the final line, we used a UNION operator to combine the matches of the simple and GRAPH patterns.

This query should find all six triples, and here they are:

['<http://example.org/people/bob>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/context2>']
['<http://example.org/people/bob>', '<http://example.org/people/name>', '"Bob"', '<http://example.org/people/context2>']
['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"']
['<http://example.org/people/ted>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/ted>', '<http://example.org/people/name>', '"Ted"']

SPARQL with Dataset Object

Next, we switch to SPARQL queries where the subgraphs are constrained by Sesame dataset objects. First we'll run the wide-open SPARQL query to see what it finds. In the next two SPARQL examples, we will control the scope of the search by using datasets. A dataset contains lists of contexts to search, and is applied to the tupleQuery object to control the scope of the search.

Here's the wide-open search, which contains no information about which graph we want to search:

    queryString = """
SELECT ?s ?p ?o WHERE {?s ?p ?o . }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
print "No dataset restrictions."
for bindingSet in result:
print bindingSet.getRow()

In this case the query returns triples from AllegroGraph's default graph and from both named graphs. This accommodates the person who just wants to "search everything."

No dataset restrictions.
['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"']
['<http://example.org/people/bob>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/bob>', '<http://example.org/people/name>', '"Bob"']
['<http://example.org/people/ted>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/ted>', '<http://example.org/people/name>', '"Ted"']

A dataset object is a Sesame construct that contains two lists of named graphs. There is one list of graphs that will become the SPARQL default graph, just like using FROM in the query. There is a second list of graphs that will be "named" graphs in the query, just like using FROM NAMED. To use the dataset, we put the graph URIs into the dataset object, and then add the dataset to the tupleQuery object. When we evaluate the tupleQuery, the results will be confined to the graphs listed in the dataset.

The next example shows how to use an AllegroGraph dataset object in an exceptional way, to restrict the SPARQL query to the triples in AllegroGraph's default graph. The dataset.addDefaultGarph() method accepts "null" as the name of AllegroGraph's default graph. This lets us direct a SPARQL query at AllegoGraph's default graph.

    queryString = """
SELECT ?s ?p ?o WHERE {?s ?p ?o . }
"""
ds = Dataset()
ds.addDefaultGraph('null')
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();
print "SPARQL query over the null context."
for bindingSet in result:
print bindingSet.getRow()

The output of this loop is the two triples that are in the default graph:

SPARQL query over the null context.
['<http://example.org/people/ted>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>']
['<http://example.org/people/ted>', '<http://example.org/people/name>', '"Ted"']

In the next example we'll add a graph to the dataset using the addNamedGraph() method. This time the wide-open query is restricted to only those statements in context1, which will be treated as a "named graph" in the query:

    queryString = """
SELECT ?s ?p ?o WHERE {?s ?p ?o . }
"""
ds = Dataset()
ds.addNamedGraph(context1)
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();
print "SPARQL query over context1, no GRAPH pattern."
for bindingSet in result:
print bindingSet.getRow()

The output of this query is somewhat unexpected. The query returns no results!

SPARQL query over context1, no GRAPH pattern.

Why did this happen? Once we explicitly identify a subgraph as a "named graph" in the query, SPARQL insists that we use a GRAPH pattern. The following example uses a dataset to target context1, and adds a GRAPH element to the query. This small change lets us focus on one subgraph only.

    queryString = """
SELECT ?s ?p ?o ?c
WHERE { GRAPH ?c {?s ?p ?o . } }

"""
ds = Dataset()
ds.addNamedGraph(context1)
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();
print "SPARQL query over context1, with GRAPH pattern."
for bindingSet in result:
print bindingSet.getRow()

The output of this loop contains two triples, as expected. These are the triples from context1.

SPARQL query over context1, with GRAPH pattern.
['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/context1>']
['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"', '<http://example.org/people/context1>']

One naturally wonders what the SPARQL GRAPH query would find if we got out of its way and ran it without any dataset restrictions. Here it is:

    queryString = """
SELECT ?s ?p ?o ?c
WHERE { GRAPH ?c {?s ?p ?o . } }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
print "SPARQL query with GRAPH pattern, no context constraints."
for bindingSet in result:
print bindingSet.getRow()

The output of this loop contains four triples, two from each of the named subgraphs in the store (context1 and context2). The query was not able to find the triples that were in the AllegroGraph default graph.

SPARQL query with GRAPH pattern, no context constraints.
['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/context1>']
['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"', '<http://example.org/people/context1>']
['<http://example.org/people/bob>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/context2>']
['<http://example.org/people/bob>', '<http://example.org/people/name>', '"Bob"', '<http://example.org/people/context2>']

Namespaces (example11)    Return to Top

A namespace is that portion of a URI that preceeds the last '#', '/', or ':' character, inclusive. The remainder of a URI is called the localname. For example, with respect to the URI "http://example.org/people/alice", the namespace is "http://example.org/people/" and the localname is "alice". When writing SPARQL queries, it is convenient to define prefixes or nicknames for the namespaces, so that abbreviated URIs can be specified. For example, if we define "ex" to be a nickname for "http://example.org/people/", then the string "ex:alice" is a recognized abbreviation for "http://example.org/people/alice". This abbreviation is called a qname.

In the SPARQL query in the example below, we see two qnames, "rdf:type" and "ex:alice". Ordinarily, we would expect to see "PREFIX" declarations in SPARQL that define namespaces for the "rdf" and "ex" nicknames. However, the Connection and Query machinery can do that job for you. The mapping of prefixes to namespaces includes the built-in prefixes RDF, RDFS, XSD, and OWL. Hence, we can write "rdf:type" in a SPARQL query, and the system already knows its meaning. In the case of the 'ex' prefix, we need to instruct it. The setNamespace() method of the connection object registers a new namespace. In the example below, we first register the 'ex' prefix, and then submit the SPARQL query. It is legal, although not recommended, to redefine the built-in prefixes RDF, etc..

The example example11() begins by borrowing a connection object from example1().

def example11():
conn = example1()

We need a namespace string (bound to the variable exns) to use when generating the alice and person URIs.

    exns = "http://example.org/people/"
alice = conn.createURI(namespace=exns, localname="alice")
person = conn.createURI(namespace=exns, localname="Person")

Now we can assert Alice's RDF:TYPE triple.

    conn.add(alice, RDF.TYPE, person)

Now we register the exns namespace with the connection object, so we can use it in a SPARQL query. The query looks for triples that have "rdf:type" in the predicate position, and "ex:Person" in the object position.

    conn.setNamespace('ex', exns)
queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . FILTER ((?p = rdf:type) && (?o = ex:Person) ) }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
print
for bindingSet in result:
print bindingSet[0], bindingSet[1], bindingSet[2]

The output shows the single triple with its fully-expanded URIs. This demonstrates that the qnames in the SPARQL query successfully matched the fully-expanded URIs in the triple.

http://example.org/people/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/people/Person

It is worthwhile to briefly discuss performance here. In the current AllegroGraph system, queries run more efficiently if constants appear inside of the "where" portion of a query, rather than in the "filter" portion. For example, the SPARQL query below will evaluate more efficiently than the one in the above example. However, in this case, you have lost the ability to output the constants "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" and "http://example.org/people/alice". Occasionally you may find it useful to output constants in the output of a 'select' clause; in general though, the above code snippet illustrates a query syntax that is discouraged.

SELECT ?s  
WHERE { ?s rdf:type ex:person } 

It is common for users to build RDF applications that combine some form of "keyword search" with their queries. For example, a user might want to retrieve all triples for which the string "Alice" appears as a word within the third (object) argument to the triple. AllegroGraph provides a capability for including free text matching within a SPARQL query, and also by using the evalFreeTextSearch() method of the connection object. It requires, however, that you create and configure indexes appropriate to the searches you want to pursue.

Example12() begins by borrowing the connection object from example1(). Then it creates a namespace string and registers the namespace with the connection object, as in the previous example.

def example12():
conn = example1()
conn.clear()
exns = "http://example.org/people/"
conn.setNamespace('ex', exns)

We have to create an index. AllegroGraph lets you create any number of text indexes, each for a specific purpose. In this case we are indexing the literal values we find in the "fullname" predicate, which we will use in resources that describe people. The createFreeTextIndex() method has many configurable parameters. Their default settings are appropriate to this situation. All we have to provide is a name for the index and the URI of the predicate (or predicates) that contain the text to be indexed.

    conn.createFreeTextIndex("index1", predicates=[URI(namespace=exns, localname='fullname')])

We can view the index configuration using the getFreeTextIndexConfiguration() method:

    config = conn.getFreeTextIndexConfiguration("index1")
print(config)
for item in config["predicates"]:
print(item)

The configuration object is a simple Python dictionary. Most of it is easy to interpret, but the predicate list has to be extracted before we see the actual predicates:

{'indexLiterals': True, 'minimumWordSize': 3, 'indexFields': ['object'], 
'stopWords': ['and', 'are', 'but', 'for', 'into', 'not', 'such', 'that', 'the', 
'their', 'then', 'there', 'these', 'they', 'this', 'was', 'will', 'with'], 
'predicates': [<franz.openrdf.model.value.URI object at 0x025831F0>], 
'wordFilters': [], 'indexResources': False}

<http://example.org/people/fullname>

This configuration says that "index1" will operate on the literal values it finds in the object position of the <http://example.org/people/fullname> predicate. It ignores words smaller than three characters in length. It will ignore the worlds in its "stopWords" list. If it encounters a resource URI in the object position, it will ignore it. This index doesn't use any wordFilters, which are sometimes used to remove accented letters and to perform stemming on indexed text and search strings.

The next step is to create three new resources. One is a person, "Alice," whose full name is "Alice B. Toklas." There will also be a book, "Alice in Wonderland," which is linked to an author resource, "Lewis Carroll." The first step is to create some URIs and literal values:

    alice = conn.createURI(namespace=exns, localname="alice")
carroll = conn.createURI(namespace=exns, localname="carroll")
persontype = conn.createURI(namespace=exns, localname="Person")
fullname = conn.createURI(namespace=exns, localname="fullname")
alicename = conn.createLiteral('Alice B. Toklas')
book = conn.createURI(namespace=exns, localname="book1")
booktype = conn.createURI(namespace=exns, localname="Book")
booktitle = conn.createURI(namespace=exns, localname="title")
author = conn.createURI(namespace=exns, localname="author")
wonderland = conn.createLiteral('Alice in Wonderland')
lewisCarroll = conn.createLiteral('Lewis Carroll')

Add the resource for the new person, Alice B. Toklas:

    # Creating Alice B. Toklas resource 
conn.add(alice, RDF.TYPE, persontype)
conn.add(alice, fullname, alicename)

Add the new book, Alice in Wonderland.

    # Creating Alice in Wonderland book resource
conn.add(book, RDF.TYPE, booktype)
conn.add(book, booktitle, wonderland)
conn.add(book, author, carroll)

Note that the book's author predicate links to the resource URI of the Lewis Carroll author resource. The author is a Person with a fullname:

    # Creating Lewis Carrol resource
conn.add(carroll, fullname, lewisCarroll)
conn.add(carroll, RDF.TYPE, persontype)

Let's use getStatements() to retrieve the new triples and inspect them:

Current content of triple store:
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)
(<http://example.org/people/alice>, <http://example.org/people/fullname>, "Alice B. Toklas")
(<http://example.org/people/book1>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Book>)
(<http://example.org/people/book1>, <http://example.org/people/title>, "Alice in Wonderland")
(<http://example.org/people/book1>, <http://example.org/people/author>, <http://example.org/people/carroll>)
(<http://example.org/people/carroll>, <http://example.org/people/fullname>, "Lewis Carroll")
(<http://example.org/people/carroll>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)

Now we set up the SPARQL query that looks for triples containing "Alice" in the object position.

The text match occurs through a "magic" predicate called fti:match. This is not an RDF "predicate" but a LISP "predicate," meaning that it behaves as a true/false test. This predicate has two arguments. One is the subject URI of the resources to search. The other is the string pattern to search for, such as "Alice". Only full-word matches will be found.

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match 'Alice' . }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();

There is no need to include a prefix declaration for the 'fti' nickname. That is because 'fti' is included among the built-in namespace/nickname mappings in AllegroGraph.

When we execute our SPARQL query, it matches the "Alice" within the literal "Alice B. Toklas" because that literal occurs in a triple having the fullname predicate, but it does not match the "Alice" in the literal "Alice in Wonderland" because the booktitle predicate was not included in our index. The SPARQL query returns all of the triples of the matching resource, not just the one that matched the text search.

    print "Found %i query results" % len(result)    
count = 0
for bindingSet in result:
print bindingSet
count += 1
if count > 5: break

The output of this loop is:

Found 2 query results
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/alice1', 'o': 'http://example.org/people/Person'} {'p': 'http://example.org/people/fullname', 's': 'http://example.org/people/alice1', 'o': '"Alice B. Toklas"'}

Note that the result set included both the matching triple and the rdf:type triple from the same resource. SPARQL returns the entire resource, not just the matching triple.

As it happens, the SPARQL interface to text search does not let you specify which index(es) to use, so SPARQL queries use all of the text indexes in the system. If you want to limit the search to a specific index, use the evalFreeTextSearch() method of the connection object:

    print("\nEvalFreeTextSearch() match 'Alice' in index1.")
for triple in conn.evalFreeTextSearch("Alice", index="index1"):
print " " + str(triple)

Unlike SPARQL, evalFreeTextSearch() returns only the matching triple:

EvalFreeTextSearch() match 'Alice' in index1.
['<http://example.org/people/alice>', '<http://example.org/people/fullname>', '"Alice B. Toklas"']

The text index supports simple wildcard queries. The asterisk (*) may be appended to the end of the pattern to indicate "any number of additional characters." For instance, this query looks for whole words that begin with "Ali":

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match 'Ali*' . }
"""

It finds the same two triples as before.

There is also a single-character wildcard, the questionmark. You can add as many question marks as you need to the string pattern. This query looks for a five-letter word that has "l" in the second position, and "c" in the fourth position:

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match '?l?c?' . }
"""

This query finds the same two triples as before.

This time we'll do something a little different. The free text indexing matches whole words only, even when using wildcards. What if you really need to match a substring in a word of unknown length. You can write a SPARQL query that performs a regex match against object values. This can be inefficient compared to indexed search, and the match is not confined to the registered free-text predicates. The following query looks for the substring "lic" in literal object values:

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . FILTER regex(?o, "lic") }
"""

This query returns two triples, but they are not quite the same as before:

Substring match for 'lic'
Found 2 query results
{'p': 'http://example.org/people/fullname', 's': 'http://example.org/people/alice1', 'o': '"Alice B. Toklas"'}
{'p': 'http://example.org/people/title', 's': 'http://example.org/people/book1', 'o': '"Alice in Wonderland"'}

As you can see, the regex match found "lic" in "Alice in Wonderland," which was not a registered free-text predicate. It made this match by doing a string comparison against every object value in the triple store. Even though you can streamline the SPARQL query considerably by writing more restrictive patterns, this is still inherently less efficient than using the indexed approach.

Note that evalFreeTextSearch() does not offer any way to do a regex substring search.

In addition to indexing literal values, AllegroGraph can also index resource URIs. "Index2" is an index that looks for URIs in the object position of the "author" predicate, and then indexes only the "short" name of the resource (the characters following the rightmost / or # in the URI). This lets us avoid indexing highly-repetitive namespace strings, which would fill the index with data that would not be very useful.

    conn.createFreeTextIndex("index2", predicates=[URI(namespace=exns, localname='author')],
indexResources="short", indexFields=["object"])

Now we'll search for the string "Carroll" in "index2:"

    print("\nMatch 'Carroll' in index2.")
for triple in conn.evalFreeTextSearch("Carroll", index="index2"):
print " " + str(triple)
Match 'Carroll' in index2.
['<http://example.org/people/book1>', '<http://example.org/people/author>', '<http://example.org/people/carroll>']

The text search located the triple that has "Carroll" in the URI in the object position.

Select, Ask, Describe, and Construct Queries (example13)   Return to Top

SPARQL provides alternatives to the standard SELECT query. Example example13() exercises these alternatives to show how AllegroGraph Server handles them.

The example begins by borrowing a connection object from example6(). This connects to a repository that contains vcard and Kennedy data. We'll need to register a Kennedy namespace to make the queries easier to read.

def example13():
conn = example6(False) # kennedy and vcards data conn.setNamespace("kdy", "http://www.franz.com/simple#")

As it happens, we don't need the vcard data this time, so we'll remove it. This is an example of how to delete an entire subgraph (the vcards "context"):

    context = conn.createURI("http://example.org#vcards")
conn.remove(None, None, None, context)

The example begins with a SELECT query so we can see some of the Kennedy resources.

    queryString = """select ?s where { ?s rdf:type kdy:person} limit 5"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
print "\nSELECT some persons"
for r in result: print r
Note that SELECT returns variable bindings. In this case it returns subject URIs of five people:
SELECT some persons
{'s': u'<http://www.franz.com/simple#person1>'}
{'s': u'<http://www.franz.com/simple#person2>'}
{'s': u'<http://www.franz.com/simple#person3>'}
{'s': u'<http://www.franz.com/simple#person4>'}
{'s': u'<http://www.franz.com/simple#person5>'}

The ASK query returns a Boolean, depending on whether the triple pattern matched any triples. In this case we ran two tests; one seeking "John" and the other looking for "Alice." Note that the ASK query uses a different construction method than the SELECT query: prepareBooleanQuery().

    queryString = """ask { ?s kdy:first-name "John" } """
booleanQuery = conn.prepareBooleanQuery(QueryLanguage.SPARQL, queryString)
result = booleanQuery.evaluate();
print "\nASK: Is there anyone named John?", result
queryString = """ask { ?s kdy:first-name "Alice" } """
booleanQuery = conn.prepareBooleanQuery(QueryLanguage.SPARQL, queryString)
result = booleanQuery.evaluate();
print "\nASK: Is there an Alice?", result

The output of this loop is:

ASK: Is there anyone named John? True
ASK: Is there an Alice? False

The CONSTRUCT query contructs a statement object out of the matching values in the triple pattern. A "statement" is a client-side triple. Construction queries use prepareGraphQuery(). The point is that the query can bind variables from existing triples and then "construct" a new triple by recombining the values. This query constructs new triples using a kdy:has-grandchild predicate.

    queryString = """
construct {?a kdy:has-grandchild ?c}
where { ?a kdy:has-child ?b .
?b kdy:has-child ?c . }
"""
constructQuery = conn.prepareGraphQuery(QueryLanguage.SPARQL, queryString)
result = constructQuery.evaluate();

The CONSTRUCT query does not actually add the new triples to the store. You have to iterate through the results and add them yourself:

    print "/nConstruct result, creating new has-grandchild triples:"
for st in result:
conn.add(st.getSubject(), st.getPredicate(), st.getObject())

The DESCRIBE query returns a "graph," meaning all triples of the matching resources. It uses prepareGraphQuery(). In this case we asked SPARQL to describe one grandparent and one grandchild. (This confirms that the kdy:has-grandchild triples successfully entered the triple store.)

    queryString = """describe ?s ?o where { ?s kdy:has-grandchild ?o . } limit 1"""
describeQuery = conn.prepareGraphQuery(QueryLanguage.SPARQL, queryString)
result = describeQuery.evaluate();
print "Describe one grandparent and one grandchild:"
for st in result: print st

The output of this loop is lengthy, because the Kennedy resources have many triples. One block of triples looked like this, showing the new has-grandchild triples:

(<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#has-grandchild>, <http://www.franz.com/simple#person20>)
(<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#has-grandchild>, <http://www.franz.com/simple#person22>)
(<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#has-grandchild>, <http://www.franz.com/simple#person24>)
(<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#has-grandchild>, <http://www.franz.com/simple#person25>)
(<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#has-grandchild>, <http://www.franz.com/simple#person26>)

Parametric Queries (example14)   Return to Top

The Python Sesame API to AllegroGraph Server lets you set up a SPARQL query and then fix the value of one of the query variables prior to matching the triples. This is more efficient than testing for the same value in the body of the query.

In example14() we set up two-triple resources for Bob and Alice, and then use an unconstrained SPARQL query to retrieve the triples. Normally this query would find all four triples, but by binding the subject value ahead of time, we can retrieve the "Bob" triples separately from the "Alice" triples.

The example begins by borrowing a connection object from example2(). This means there are already Bob and Alice resources in the repository. We do need to recreate the URIs for the two resources, however.

def example14():
conn = example2()
alice = conn.createURI("http://example.org/people/alice")
bob = conn.createURI("http://example.org/people/bob")

The SPARQL query is the simple, unconstrained query that returns all triples. We use prepareTupleQuery() to create the query object.

    queryString = """select ?s ?p ?o where { ?s ?p ?o} """
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)

Before evaluating the query, however, we'll use the query object's setBinding() method to assign Alice's URI to the "s" variable in the query. This means that all matching triples are required to have Alice's URI in the subject position of the triple.

    tupleQuery.setBinding("s", alice)
result = tupleQuery.evaluate()
print "Facts about Alice:"
for r in result: print r

The output of this loop consists of all triples that describe Alice:

Facts about Alice:
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/alice', 'o': 'http://example.org/ontology/Person'} {'p': 'http://example.org/ontology/name', 's': 'http://example.org/people/alice', 'o': '"Alice"'}

Now we'll run the same query again, but this time we'll constrain "s" to be Bob's URI. The query will return all triples that describe Bob.

    tupleQuery.setBinding("s", bob)
print "Facts about Bob:"
result = tupleQuery.evaluate()
for r in result: print r
The output of this loop is:
Facts about Bob:
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/bob', 'o': 'http://example.org/ontology/Person'} {'p': 'http://example.org/ontology/name', 's': 'http://example.org/people/bob', 'o': '"Bob"'}

Range Matches (example15)   Return to Top

Example15() demonstrates how to set up a query that matches a range of values. In this case, we'll retrieve all people between 30 and 50 years old (inclusive). We can accomplish this using the connection object's createRange() method.

This example begins by getting a connection object from example1(), and then clearing the repository of the existing triples.

def example15():
conn = example1()
conn.clear()

Then we register a namespace to use in the query.

    exns = "http://example.org/people/"
conn.setNamespace('ex', exns)

Next we need to set up the URIs for Alice, Bob, Carol and the predicate "age".

    alice = conn.createURI(namespace=exns, localname="alice")
bob = conn.createURI(namespace=exns, localname="bob")
carol = conn.createURI(namespace=exns, localname="carol")
age = conn.createURI(namespace=exns, localname="age")

In this step, we use the connection's createRange() method to generate a range object with limits 30 and 50:

    range = conn.createRange(30, 50)

The next two lines are essential to the experiment, but you can take your pick of which one to use. The range comparison requires that all of the matching values must have the same datatype. In this case, the values must all be ints. The connection object lets us force this uniformity on the data through the registerDatatypeMapping() method. You can force all values of a specific predicate to be internally represented as one datatype, as we do here:

    conn.registerDatatypeMapping(predicate=age, nativeType="int")

This line declares that all values of the age predicate will be represented in triples as if they were "int" values in Python.

    conn.registerDatatypeMapping(datatype=XMLSchema.INT, nativeType="int")    

However, this doesn't mean quite what you think it does. As it happens, due to implementation details that differ between Python and AllegroGraph, the closest AllegroGraph datatype to Python's "int" is XMLSchema.LONG. When we create the following triples, we find that they contain XMLSchema.LONG values.

    conn.add(alice, age, 42)
    conn.add(bob, age, 24) 
conn.add(carol, age, "39")

The next step is to use getStatements() to retrieve all triples where the age value is between 30 and 50.

    statements = conn.getStatements(None, age, range)
for s in statements:
print s

The output of this loop is:

(<http://example.org/people/alice>, <http://example.org/people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)
(<http://example.org/people/carol>, <http://example.org/people/age>, "39"^^<http://www.w3.org/2001/XMLSchema#long>)

The range query has matched 42 and "39", but not 24.

Federated Repositories (example16)   Return to Top

AllegroGraph lets you split up your triples among repositories on multiple servers and then search them all in parallel. To do this we query a single "federated" repository that automatically distributes the queries to the secondary repositories and combines the results. From the point of view of your Python code, it looks like you are working with a single repository.

Example16() begins by defining a small output function that we'll use at the end of the lesson. It prints out responses from different repositories. This example is about red apples and green apples, so the output function talks about apples.

def example16():
def pt(kind, rows, expected):
print "\n%s Apples:\t" % kind.capitalize(),
for r in rows: print r[0].getLocalName(),

In the next block of code, we open connections to a redRepository and a greenRepository on the local server. In a typical federation scenario, these respositories would be distributed across multiple servers.

    server = AllegroGraphServer(AG_HOST, AG_PORT, 'test', 'xyzzy')
catalog = server.openCatalog(AG_CATALOG)
## create two ordinary stores, and one federated store:
redConn = catalog.getRepository("redthings", Repository.RENEW).initialize().getConnection()
greenConn = catalog.getRepository("greenthings", Repository.RENEW).initialize().getConnection()

Now we create a "federated" respository, which is connected to the distributed repositories at the back end. When we open a federated repository, AllegroGraph creates a session and returns a connection:

    rainbowThings = server.openFederated([redConn, greenConn], True)

It is necessary to register the "ex" namespace in all three repositories so we can use it in the upcoming query.

    ex = "http://www.demo.com/example#"
redConn.setNamespace('ex', ex)
greenConn.setNamespace('ex', ex)
rainbowThings.setNamespace('ex', ex)

The next step is to populate the Red and Green repositories with a few triples. Note that one of the green objects is not an apple:

    redConn.add(redConn.createURI(ex+"mcintosh"), RDF.TYPE, redConn.createURI(ex+"Apple"))
redConn.add(redConn.createURI(ex+"reddelicious"), RDF.TYPE, redConn.createURI(ex+"Apple"))
greenConn.add(greenConn.createURI(ex+"pippin"), RDF.TYPE, greenConn.createURI(ex+"Apple"))
greenConn.add(greenConn.createURI(ex+"kermitthefrog"), RDF.TYPE, greenConn.createURI(ex+"Frog"))

Let's check to see if the federated session can "see" the new triples:

        print "\nFederated size: " + str(rainbowThings.size())
Federated size: 4

This tells us that the federated store can see the triples in the component stores. Now we write a query that retrieves Apples from the Red repository, the Green repository, and the federated repository, and prints out the results.

    queryString = "select ?s where { ?s rdf:type ex:Apple }"
## query each of the stores; observe that the federated one is the union of the other two:
pt("red", redConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString).evaluate())
pt("green", greenConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString).evaluate())
pt("federated", rainbowConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString).evaluate())

The output is shown below. The federated response combines the individual responses.

Red Apples:   reddelicious mcintosh 
Green Apples: pippin
Federated Apples: reddelicious mcintosh pippin

It should go without saying that a search for apples does not turn up a frog.

Prolog Rule Queries (example17)   Return to Top

AllegroGraph Server lets us load Prolog backward-chaining rules to make query-writing simpler.  The Prolog rules let us write the queries in terms of higher-level concepts.  When a query refers to one of these concepts, Prolog rules become active in the background to determine if the concept is valid in the current context. 

For instance, in this example the query says that the matching resource must be a "man".  A Prolog rule examines the matching resources to see which of them are persons who are male.  The query can proceed for those resources. The rules provide a level of abstraction that makes the queries simpler to express.

The example17() example begins by borrowing a connection object from example example6(), which contains the Kennedy family tree.  Note that example6() creates a dedicated session for the rules to operate in, using the Connection object's openSession() method. Python rules cannot be loaded into the AllegroGraph common back end.

conn.openSession(); # in example6()

This converts the connection to a "dedicated" connection. After that step, all of the code is exactly the same as when using the common back end.

def example17():
conn = example6() #obtain dedicated connection from example6()

We will need the same namespace as we used in the Kennedy example.

    conn.setNamespace("kdy", "http://www.franz.com/simple#")

These are the "man" and "woman" rules.  A resource represents a "woman" if the resource contains a sex=female triple and an rdf:type = person triple.  A similar deduction identifies a "man".  The "q" at the beginning of each pattern simply stands for "query" and introduces a triple pattern.

    rules1 = """
(<-- (woman ?person) ;; IF
(q ?person !kdy:sex !kdy:female)
(q ?person !rdf:type !kdy:person))
(<-- (man ?person) ;; IF
(q ?person !kdy:sex !kdy:male)
(q ?person !rdf:type !kdy:person))
"""

The rules must be explicitly added to the dedicated session.

    conn.addRules(rules1)

This is the query.  This query locates all the "man" resources, and retrieves their first and last names.

    queryString = """
(select (?first ?last)
(man ?person)
(q ?person !kdy:first-name ?first)
(q ?person !kdy:last-name ?last)
)
"""

Here we perform the query and retrieve the result object. 

    tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();

The result object contains multiple bindingSets.  We can iterate over them to print out the values.

    for bindingSet in result:
f = bindingSet.getValue("first")
l = bindingSet.getValue("last")
print "%s %s" % (f.toPython(), l.toPython())

The output contains many names; there are just a few of them.

Robert Kennedy
Alfred Tucker
Arnold Schwarzenegger
Paul Hill
John Kennedy

It is good form to close the dedicated session when you are finished with it.

    conn.closeDedicated()

Loading Prolog Rules (example18)   Return to Top

Example example18() demonstrates how to load a file of Prolog rules into the Python Sesame API of AllegroGraph Server.  It also demonstrates how robust a rule-augmented system can become.  The domain is the Kennedy family tree again, borrowed from example6().  After loading a file of rules (python-rules.txt), we'll pose a simple query.  The query asks AllegroGraph to list all the uncles in the family tree, along with each of their nieces or nephews.  This is the query:

(select (?ufirst ?ulast ?cfirst ?clast)
(uncle ?uncle ?child)
(name ?uncle ?ufirst ?ulast)
(name ?child ?cfirst ?clast))

The problem is that the triple store contains no information about uncles.  The rules will have to deduce this relationship by finding paths across the RDF graph. 

What's an "uncle," then?  Here's a rule that can recognize uncles:

(<-- (uncle ?uncle ?child) 
(man ?uncle)
(parent ?grandparent ?uncle)
(parent ?grandparent ?siblingOfUncle)
(not (= ?uncle ?siblingOfUncle))
(parent ?siblingOfUncle ?child))

The rule says that an "uncle" is a "man" who has a sibling who is the "parent" of a child.  (Rules like this always check to be sure that the two nominated siblings are not the same resource.) Note that none of these relationships are triple patterns. They all deal in higher-order concepts. We'll need additional rules to determine what a "man" is, and what a "parent" is.

What is a "parent?" It turns out that there are two ways to be classified as a parent:

(<-- (parent ?father ?child)
(father ?father ?child)) (<-- (parent ?mother ?child)
(mother ?mother ?child))

A person is a "parent" if a person is a "father."  Similarly, a person is a "parent" if a person is a "mother."

What's a "father?"

(<-- (father ?parent ?child)
(man ?parent)
(q ?parent !rltv:has-child ?child))

A person is a "father" if the person is "man" and has a child.   The final pattern (beginning with "q") is a triple match from the Kennedy family tree.

What's a "man?"

(<-- (man ?person)
(q ?person !rltv:sex !rltv:male)
(q ?person !rdf:type !rltv:person))

A "man" is a person who is male.  These patterns both match triples in the repository. 

The python_rules.txt file contains many more Prolog rules describing relationships, including transitive relationships like "ancestor" and "descendant." Please examine this file for more ideas about how to use rules with AllegroGraph.

The example18() example begins by borrowing a connection object from example6(), which means the Kennedy family tree is already loaded into the repository, and we are dealing with a dedicated session.  

def example18():
conn = example6()

We need these two namespaces because they are used in the query and in the file of rules.

    conn.setNamespace("kdy", "http://www.franz.com/simple#")
conn.setNamespace("rltv", "http://www.franz.com/simple#")

The next step is to load the rule file:

    path = "./python_rules.txt"
conn.loadRules(path)

The query asks for the first and last names of each uncle and each niece/nephew.  The name strings are provided by another Prolog rule:

    queryString = """(select (?ufirst ?ulast ?cfirst ?clast)
(uncle ?uncle ?child)
(name ?uncle ?ufirst ?ulast)
(name ?child ?cfirst ?clast))"""

Here we execute the query and display the results. The code is a little more complicated than usual because of the string concatenations that build the names.

    tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
for bindingSet in result:
u1 = bindingSet.getValue("ufirst")
u2 = bindingSet.getValue("ulast")
ufull = u1.toPython() + " " + u2.toPython()
c1 = bindingSet.getValue("cfirst")
c2 = bindingSet.getValue("clast")
cfull = c1.toPython() + " " + c2.toPython()
print "%s is the uncle of %s." % (ufull, cfull)

The output of this loop (in part) looks like this:

Robert Kennedy is the uncle of Amanda Smith.
Robert Kennedy is the uncle of Kym Smith.
Edward Kennedy is the uncle of Robert Shriver.
Edward Kennedy is the uncle of Maria Shriver.
Edward Kennedy is the uncle of Timothy Shriver.

As before, it is good form to close the dedicated session when you are finished with it.

    conn.closeSession()
conn.close();
myRepository = conn.repository
myRepository.shutDown()

RDFS++ Inference (example19)   Return to Top

The great promise of the semantic web is that we can use RDF metadata to combine information from multiple sources into a single, common model.  The great problem of the semantic web is that it is so difficult to recognize when two resource descriptions from different sources actually represent the same thing.  This problem arises because there is no uniform or universal way to generate URIs identifying resources.  As a result, we may create two resources, Bob and Robert, that actually represent the same person.

This problem has generated much creativity in the field. One way to approach the problem is through inference.  There are certain relationships and circumstances where an inference engine can deduce that two resource descriptions actually represent one thing, and then automatically merge the descriptions.  AllegroGraph's inference engine can be turned on or off each time you run a query against the triple store.  (Note that inference is turned off by default, which is the opposite of standard Sesame behavior.)

In example example19(), we will create four resources: Bob, with son Bobby, and Robert with daughter Roberta. 

First we have to set up the data. We begin by generating four URIs for the new resources.

    ## Create URIs for Bob and Robert (and kids) 
    robert = conn.createURI("http://example.org/people/robert")
	roberta = conn.createURI("http://example.org/people/roberta")
	bob = conn.createURI("http://example.org/people/bob")
	bobby = conn.createURI("http://example.org/people/bobby")
The next step is to create URIs for the predicates we'll need (name and fatherOf), plus one for the Person class.
    ## create name and child predicates, and Person class.
name = conn.createURI("http://example.org/ontology/name")
fatherOf = conn.createURI("http://example.org/ontology/fatherOf")
person = conn.createURI("http://example.org/ontology/Person")
The names of the four people will be literal values.
    ## create literal values for names    
bobsName = conn.createLiteral("Bob")
bobbysName = conn.createLiteral("Bobby")
robertsName = conn.createLiteral("Robert")
robertasName = conn.createLiteral("Roberta")

Robert, Bob and the children are all instances of class Person.  It is good practice to identify all resources by an rdf:type link to a class.

    ## Robert, Bob, and children are people
conn.add(robert, RDF.TYPE, person)
conn.add(roberta, RDF.TYPE, person)
conn.add(bob, RDF.TYPE, person)
conn.add(bobby, RDF.TYPE, person)
The four people all have literal names.
    ## They all have names.
conn.add(robert, name, robertsName)
conn.add(roberta, name, robertasName)
conn.add(bob, name, bobsName)
conn.add(bobby, name, bobbysName)
Robert and Bob have links to the child resources:
    ## robert has a child
conn.add(robert, fatherOf, roberta)
## bob has a child
conn.add(bob, fatherOf, bobby)

SameAs, using a SPARQL Query

Now that the basic resources and relations are in place, we'll seed the triple store with a statement that "Robert is the same as Bob," using the owl:sameAs predicate.  The AllegroGraph inference engine recognizes the semantics of owl:sameAs, and automatically infers that Bob and Robert share the same attributes.  Each of them originally had one child.  When inference is turned on, however, they each have two children. 

Note that SameAs does not combine the two resources.  Instead it links each of the two resources to all of the combined children.  The red links in the image are "inferred" triples.  They have been deduced to be true, but are not actually present in the triple store.

This is the critical link that tells the inference engine to regard Bob and Robert as the same resource.

    ## Bob is the same person as Robert
conn.add(bob, OWL.SAMEAS, robert)

This is a simple SPARQL query asking for the children of Robert, with inference turned OFF. Note the use of tupleQuery.setIncludeInferred(), which controls whether or not inferred triples may be included in the query results. Inference is turned off by default, but for teaching purposes we have turned it of explicitly. We also took the liberty of setting variable bindings for ?robert and ?fatherOf, simply to make the code easier to read. Otherwise we would have had to put full-length URIs in the query string.

    queryString = "SELECT ?child WHERE {?robert ?fatherOf ?child .}"
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setIncludeInferred(False) # Turn off inference
tupleQuery.setBinding("robert", robert)
tupleQuery.setBinding("fatherOf", fatherOf)
result = tupleQuery.evaluate()
print "Children of Robert, inference OFF:"
for r in result: print r

The search returns one triple, which is the link from Robert to his direct child, Roberta.

Children of Robert, inference OFF
{'child': u'<http://example.org/people/roberta>'}

Now we'll perform the same query (the same tupleQuery, in fact), with inference turned ON. 

    tupleQuery.setIncludeInferred(True)
result = tupleQuery.evaluate()
print "Children of Robert, inference ON:"
for r in result: print r
Children of Robert, inference ON
{'child': u'<http://example.org/people/roberta>'}
{'child': u'<http://example.org/people/bobby>'}

Note that with inference ON, Robert suddenly has two children because Bob's child has been included.  Also note that the final triple (robert fatherOf bobby) has been inferred.  The inference engine has determined that this triple logically must be true, even though it does not appear in the repository.

InverseOf

We can reuse the Robert family tree to see how the inference engine can deduce the presence of inverse relationships.

Up to this point in this tutorial, we have created new predicates simply by creating a URI and using it in the predicate position of a triple.  This time we need to create a predicate resource so we can set an attribute of that resource.  We're going to declare that the hasFather predicate is the owl:inverseOf the existing fatherOf predicate.

The first step is to remove the owl:sameAs link, because we are done with it.

    conn.remove(bob, OWL.SAMEAS, robert)

We'll need a URI for the new hasFather predicate:

    hasFather = conn.createURI("http://example.org/ontology/hasFather")

This is the line where we create a predicate resource.  It is just a triple that describes a property of the predicate.  The hasFather predicate is the inverse of the fatherOf predicate:

     conn.add(hasFather, OWL.INVERSEOF, fatherOf)

For this example, we're going to use getStatements() instead of SPARQL. The simple getStatements() format is sufficient to our needs, and simplifies the code. First, we'll search for hasFather triples, leaving inference OFF to show that there are no such triples in the repository. The fifth parameter to getStatements(), shown here as False, controls inferencing.

    print "People with fathers, inference OFF"
    for s in conn.getStatements(None, hasFather, None, None, False): print s    
People with fathers, inference OFF

Now we'll turn inference ON by changing the fifth parameter to True. This time, the AllegroGraph inference engine discovers two "new" hasFather triples.

    print "People with fathers, inference ON"
    for s in conn.getStatements(None, hasFather, None, None, True): print s  
People with fathers, inference ON
(<http://example.org/people/bobby>, <http://example.org/ontology/hasFather>, <http://example.org/people/bob>) (<http://example.org/people/roberta>, <http://example.org/ontology/hasFather>, <http://example.org/people/robert>)

Both of these triples are inferred by the inference engine.

SubPropertyOf

Invoking inference using the rdfs:subPropertyOf predicate lets us "combine" two predicates so they can be searched as one.  For instance, in our Robert/Bob example, we have explicit fatherOf relations.  Suppose there were other resources that used a parentOf relation instead of fatherOf.  By making fatherOf a subproperty of parentOf, we can search for parentOf triples and automatically find the fatherOf triples at the same time.

First we should remove the owl:inverseOf relation from the previous example. We don't have to, but it keeps things simple.

    ## Remove owl:inverseOf property.
conn.remove(hasFather, OWL.INVERSEOF, fatherOf)

We'll need a parentOf URI to use as the new predicate.  Then we'll add a triple saying that fatherOf is an rdfs:subPropertyOf the new predicate, parentOf:

    parentOf = conn.createURI("http://example.org/ontology/parentOf")
conn.add(fatherOf, RDFS.SUBPROPERTYOF, parentOf)

If we now search for parentOf triples with inference OFF, we won't find any. No such triples exist in the repository.

    print "People with parents, inference OFF"
for s in conn.getStatements(None, parentOf, None, None, False): print s
People with parents, inference OFF

With inference ON, however, AllegroGraph infers two new triples:

    print "People with parents, inference ON"
for s in conn.getStatements(None, parentOf, None, None, True): print s
People with parents, inference ON
(<http://example.org/people/bob>, <http://example.org/ontology/parentOf>, <http://example.org/people/bobby>)
(<http://example.org/people/robert>, <http://example.org/ontology/parentOf>, <http://example.org/people/roberta>)

The fact that two fatherOf triples exist means that two correponding parentOf triples must be valid.  There they are.

Before setting up the next example, we should clean up:

conn.remove(fatherOf, RDFS.SUBPROPERTYOF, parentOf) 

Domain and Range

When you declare the domain and range of a predicate, the AllegroGraph inference engine can infer the rdf:type of resources found in the subject and object positions of the triple.  For instance, in the triple <subject, fatherOf, object> we know that the subject is always an instance of class Parent, and the object is always an instance of class Child. 

In RDF-speak, we would say that the domain of the fatherOf predicate is rdf:type Parent.  The range of fatherOf is rdf:type Child. 

This lets the inference engine determine the rdf:type of every resource that participates in a fatherOf relationship. 

We'll need two new classes, Parent and Child.  Note that RDF classes are always capitalized, just as predicates are always lowercase.

    parent = conn.createURI("http://example.org/ontology/Parent")
child = conn.createURI("http://exmaple.org/ontology/Child")

Now we add two triples defining the domain and rage of the fatherOf predicate:

    conn.add(fatherOf, RDFS.DOMAIN, parent)
conn.add(fatherOf, RDFS.RANGE, child)

Now we'll search for resources of rdf:type Parent.  The inference engine supplies the appropriate triples:

    print "Who are the parents?  Inference ON."
for s in conn.getStatements(None, RDF.TYPE, parent, None, True): print s
Who are the parents?  Inference ON.
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Parent>
(<http://example.org/people/robert>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Parent>)

Bob and Robert are parents.  Who are the children?

    print "Who are the children?  Inference ON."
for s in conn.getStatements(None, RDF.TYPE, child, None, True): print s
Who are the children?  Inference ON.
(<http://example.org/people/bobby>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://exmaple.org/ontology/Child>)
(<http://example.org/people/roberta>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://exmaple.org/ontology/Child>)

Bobby and Roberta are the children. 

AllegroGraph provides the ability to locate resources within a geospatial coordinate system. You can set up either a flat (X,Y Cartesian) or spherical (latitude, longitude) system. The systems are two-dimensional only. (There is no Z or altitude dimension available).

The purpose of the geospatial representation is to efficiently find all entities that are located within a specific circular, rectangular or polygonal area.

Cartesian System

A Cartesian system is a flat (X,Y) plane. Locations are designated by (X,Y) pairs. At this time, AllegroGraph does not support real-world measurement units (km, miles, latitude, etc.,) in the Cartesian system.

The first example uses a Cartesian (X,Y) system that is 100 units square, and contains three people located at various points along the X = Y diagonal.

The example is in the function example20(). After establishing a connection, it begins by creating URIs for the three people.

    exns = "http://example.org/people/"
conn.setNamespace('ex', exns)
alice = conn.createURI(exns, "alice")
bob = conn.createURI(exns, "bob")
carol = conn.createURI(exns, "carol")

Then we have the connection object generate a rectangular coordinate system for us to use. A rectangular (Cartesian) system can be used to represent anything that can be plotted using (X,Y) coordinates, such as the location of transistors on a silicon chip.

    conn.createRectangularSystem(scale=10, xMax=100, yMax=100) 

The size of the coordinate system is determined by the xMin, xMax, yMin and yMax parameters. The minimum values default to zero, so this system is 0 to 100 in the X dimension, and 0 to 100 in the Y dimension.

The scale parameter influences how the coordinate data is stored and retrieved, and impacts search performance. The task is to locate the people who are within a specific region. As a rule of thumb, set the scale parameter to approximately the same value as the height (Y-axis) of your typical search region. You can be off by a factor of ten without impacting performance too badly, but if your application will search regions that are orders of magnitude different in size, you'll want to create multiple coordinate systems that are scaled for different sized search regions. In this case, our search region is about 20 units high (Y), and we have set the scale parameter to 10 units. That's close enough.

The next step is to create a "location" predicate and enter the locations of the three people.

    location = conn.createURI(exns, "location")
conn.add(alice, location, conn.createCoordinate(30,30))
conn.add(bob, location, conn.createCoordinate(40, 40))
conn.add(carol, location, conn.createCoordinate(50, 50))

Note that the coordinate pairs need to be encapsulated in a GeoCoordinate object to facilitate indexing and retrieval, using the connection object's createCoordinate() method.

At this point we have a Cartesian coordinate system containing three entities at specific (X,Y) locations. The next step is to define a search region. The first example is a "box" that is twenty units square, with the upper left corner at position (20, 20). The createBox() method requires parameters for xMin, xMax, yMin, and yMax.

    box1 = conn.createBox(20, 40, 20, 40)  

The problem is to find the people whose locations lie within this box:

Locating the matching entities is remarkably easy to do:

    for r in conn.getStatements(None, location, box1) : print r 

This retrieves all the location triples whose coordinates fall within the box1 region. Here are the resulting triples:

(<http://example.org/people/alice>, <http://example.org/people/location>, 
"+30.000000004656613+30.000000004656613"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)
(<http://example.org/people/bob>, <http://example.org/people/location>, "+39.999999990686774+39.999999990686774"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)

AllegroGraph has located Alice and Bob, as expected. Note that Bob was exactly on the corner of the search area, showing that the boundaries are inclusive.

We can also find all objects within a circle with a known center and radius. Circle1 is centered at (35, 35) and has a radius of 10 units.

    circle1 = conn.createCircle(35, 35, radius=10)   

A search within circle1 finds Alice and Bob again:

    for r in conn.getStatements(None, location, circle1) : print r 
(<http://example.org/people/alice>, <http://example.org/people/location>, 
"+30.000000004656613+30.000000004656613"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)
(<http://example.org/people/bob>, <http://example.org/people/location>, "+39.999999990686774+39.999999990686774"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)

AllegroGraph can also locate points that lie within an irregular polygon. Just tell AllegroGraph the vertices of the polygon:

    polygon1 = conn.createPolygon([(10,40), (50,10), (35,40), (50,70)]) 

When we ask which people are within polygon1, AllegroGraph finds Alice.

    for r in conn.getStatements(None, location, polygon1) : print r 
(<http://example.org/people/alice>, <http://example.org/people/location>, 
"+30.000000004656613+30.000000004656613"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)

Spherical System

A spherical coordinate system projects (X,Y) locations on a spherical surface, simulating locations on the surface of the earth. AllegroGraph supports the usual units of latitude and longitude in the spherical system. The default unit of distance is the kilometer (km). (These functions presume that the sphere is the size of the planet earth. For spherical coordinate systems of other sizes, you will have to work with the Lisp radian functions that underlie this interface.)

To establish a global coordinate system, use the connection object's createLatLongSystem() method.

    latLongGeoType = conn.createLatLongSystem(scale=5, unit='degree')

Once again, the scale parameter is an estimate of the size of a typical search area, in the longitudinal direction this time. The default unit is the degree. For this system, we expect a typical search to cover about five degrees in the east-west direction. Actual search regions may be as much as ten times larger or smaller without significantly impacting performance. If the application will use search regions that are significantly larger or smaller, then you will want to create multiple coordinate systems that have been optimized for different scales.

First we set up the resources for the entities within the spherical system. We'll need these subject URIs:

    amsterdam = conn.createURI(exns, "amsterdam")
london = conn.createURI(exns, "london")
sanfrancisto = conn.createURI(exns, "sanfrancisco")
salvador = conn.createURI(exns, "salvador")

Then we'll need a geolocation predicate to describe the lat/long coordinates of each entity.

    location = conn.createURI(exns, "geolocation")

Now we can create the entities by asserting a geolocation for each one. Note that the coordinates have to be encapsulated in coordinate objects:

    conn.add(amsterdam, location, conn.createCoordinate(52.366665, 4.883333))
conn.add(london, location, conn.createCoordinate(51.533333, -0.08333333))
conn.add(sanfrancisto, location, conn.createCoordinate(37.783333, -122.433334))
conn.add(salvador, location, conn.createCoordinate(13.783333, -88.45))

The coordinates are decimal degrees. Northern latitudes and eastern longitudes are positive.

The next step is to create a box-shaped region, so we can see what entities lie within it.

    box2 = conn.createBox( 25.0, 50.0, -130.0, -70.0) 

This region corresponds roughly to the contiguous United States.

Now we retrieve all the triples located within the search region:

    for r in conn.getStatements(None, location, box2) : print r     

AllegroGraph has located San Francisco:

(<http://example.org/people/sanfrancisco>, <http://example.org/people/geolocation>,
"+374659.49909-1222600.00212"^^<http://franz.com/ns/allegrograph/3.0/geospatial/
spherical/degrees/-180.0/180.0/-90.0/90.0/5.0>) 

This time let's search for entities within 2000 kilometers of Mexico City, which is located at 19.3994 degrees north latitude, -99.08 degrees west longitude.

    circle2 = conn.createCircle(19.3994, -99.08, 2000, unit='km')

    for r in conn.getStatements(None, location, circle2) : print r
(<http://example.org/people/salvador>, <http://example.org/people/geolocation>, 
"+134659.49939-0882700"^^<http://franz.com/ns/allegrograph/3.0/geospatial/spherical/degrees/-180.0/180.0/-90.0/90.0/5.0>)

And AllegroGraph returns the triple representing El Salvador.

In the next example, the search area is a triangle roughly enclosing the United Kingdom:

    polygon2 = conn.createPolygon([(51.0, 2.00),(60.0, -5.0),(48.0,-12.5)])

We ask AllegroGraph to find all entities within this triangle:

    for r in conn.getStatements(None, location, polygon2) : print r 
(<http://example.org/people/london>, <http://example.org/people/geolocation>, 
"+513159.49909-0000459.99970"^^<http://franz.com/ns/allegrograph/3.0/geospatial/spherical/degrees/-180.0/180.0/-90.0/90.0/5.0>)

AllegroGraph returns the location of London, but not the nearby Amsterdam.

Social Network Analysis (example21)   Return to Top

AllegroGraph includes sophisticated algorithms for social-network analysis (SNA). It can examine an RDF graph of relationships among people (or similar entities, such as businesses) and discover:

This section has multiple subsections:

Most (but not all) of AllegroGraph's SNA features can be accessed from Python. We access them in multiple ways:

Example Network

The example file for this exercise is lesmis.rdf. It contains resources representing 80 characters from Victor Hugo's Les Miserables, a novel about Jean Valjean's search for redemption in 17th-century Paris.

The raw data behind the model measured the strength of relationships by counting the number of book chapters where two characters were both present. The five-volume novel has 365 chapters, so it was possible to create a relationship network that had some interesting features. This is a partial display of the graph in Franz's Gruff graphical browser.

There are four possible relationships between any two characters.

(The Gruff illustrations were made from a parallel repository in which the resources were altered to display the character's name in the graph node rather than his URI. That file is called lemisNames.rdf.)

Setting Up the Example

The SNA examples are in function example21() in tutorial_examples.py. This exercise begins by creating a dedicated session for the SNA Prolog queries to operation within. These are the same initializing steps we have used in previous examples, plus conn.openSession() to create a dedicated connection.

    server = AllegroGraphServer(AG_HOST, AG_PORT, AG_USER, AG_PASSWORD)
catalog = server.openCatalog(AG_CATALOG)
myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW)
myRepository.initialize()
conn = myRepository.getConnection()
conn.openSession() # SNA requires dedicated session.

The next step is to load the lesmis.rdf file.

    path1 = "./lesmis.rdf"
    print "Load Les Miserables triples."
    conn.addFile(path1, None, format=RDFFormat.RDFXML);

There are three predicates of interest in the Les Miserables repository. We need to create their URIs and bind them for later use. These are the knows, barely_knows, and knows_well predicates.

    # Create URIs for relationship predicates.
    lmns = "http://www.franz.com/lesmis#"
    conn.setNamespace('lm', lmns)
    knows = conn.createURI(lmns, "knows")
    barely_knows = conn.createURI(lmns, "barely_knows")
    knows_well = conn.createURI(lmns, "knows_well")

We need to bind URIs for two characters: Valjean and Bossuet. Any analysis of Les Miserables will involve Valjean. Bossuet is someone who "barely knows" Valjean, but the two characters are linked through multiple characters who are more strongly connected. We will ask AllegroGraph to find paths from Valjean to Bossuet.

    # Create URIs for some characters.
    valjean = conn.createURI(lmns, "character11")
    bossuet = conn.createURI(lmns, "character64")

Creating SNA Generators

The SNA functions use "generators" to describe the relationships we want to analyze. A generator encapsulates a list of predicates to use in social network analysis. It also describes the directions in which each predicate is interesting.

In an RDF graph, two resources are linked by a single triple, sometimes called a "resource-valued predicate." This triple has a resource URI in the subject position, and a different one in the object position. For instance:

    (<Cosette>, knows_well, <Valjean>)

This triple is a one-way link unless we tell the generator to treat it as bidirectional. This is frequently necessary in RDF data, where inverse relations are often implied but not explicitly declared as triples.

For this exercise, we will declare three generators:

"Intimates" takes a narrow view of persons who know one another quite well. "Associates" follows both strong and medium relationships. "Everyone" follows all relationships, even the weak ones. This provides three levels of resolution for our analysis.

The connection object's registerSNAGenerator() method asks for a generator name (any label), and then for one or more predicates of interest. Each predicate should be assigned to the "subjectOf" direction, the "objectOf" direction, or the "undirected" direction (both ways at once). In addition, you may specify a "generator query," which is a Prolog "select" query that lets you be more specific about the links you want to analyze.

"Intimates" follows "knows_well" links only, but it treats them as bidirectional. If Cosette knows Valjean, then we'll assume that Valjean knows Cosette.

    conn.registerSNAGenerator("intimates", subjectOf=None, objectOf=None, 
undirected=knows_well, generator_query=None)

"Associates" follows "knows" and "knows_well" links.

    conn.registerSNAGenerator("associates", subjectOf=None, objectOf=None, 
undirected=[knows, knows_well], generator_query=None)

"Everyone" follows all three relationship links.

    conn.registerSNAGenerator("everyone", subjectOf=None, objectOf=None, 
undirected=[knows, knows_well, barely_knows], generator_query=None)

 

Creating Neighbor Matrices

A generator provides a powerful and flexible tool for examining a graph, but it performs repeated queries against the repository in order to extract the subgraph appropriate to your query. If your data is static, the generator will extract the same subgraph each time you use it. It is better to run the generator once and store the results for quick retrieval.

That is the purpose of a "neighbor matrix." This is a persistent, in-memory cache of a generator's output. You can substitute the matrix for the generator in AllegroGraph's SNA functions.

The advantage of using a matrix instead of a generator is a many-fold increase in speed. This benefit is especially visible if you are searching for paths between two nodes in your graph. The exact difference in speed is difficult to estimate because there can be complex trade-offs and scaling issues to consider, but it is easy to try the experiment and observe the effect.

To create a matrix, use the connection object's registerNeighborMatrix() method. You must supply a matrix name (any symbol), the name of the generator, the URI of a resource to serve as the starting point, and a maximum depth. The idea is to place limits on the subgraph so that the search algorithms can operate in a restricted space rather than forcing them to analyze the entire repository.

In the following excerpt, we are creating three matrices to match the three generators we created. In this example, "matrix1" is the matrix for generator "intimates," and so forth.

    conn.registerNeighborMatrix("matrix1", "intimates", valjean, max_depth=2)
conn.registerNeighborMatrix("matrix2", "associates", valjean, max_depth=2)
conn.registerNeighborMatrix("matrix3", "everyone", valjean, max_depth=2)

A matrix is a static snapshot of the output of a generator. If your data is dynamic, you should regenerate the matrix at intervals.

Deleting Generators and Matrices

There is no direct way to delete individual matrices and generators, but closing the dedicated session frees all of the resources formerly used by all of the objects and structures that were created there.

Our first search will enumerate Valjean's "ego group members." This is the set of nodes (characters) that can be found by following the interesting predicates out from Valjean's node of the graph to some specified depth. We'll use the "associates" generator ("knows" and "knows_well") to specify the predicates, and we'll impose a depth limit of one link. This is the group we expect to find:

The following Python code sends a Prolog query to AllegroGraph and returns the result to Python.

    print "Valjean's ego group (using associates)."
    queryString = """
    (select (?member ?name)
      (ego-group-member !lm:character11 1 associates ?member)
      (q ?member !dc:title ?name))
      """
    tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
    result = tupleQuery.evaluate();
    print "Found %i query results" % len(result)      
for bindingSet in result: p = bindingSet.getValue("member") n = bindingSet.getValue("name") print "%s %s" % (p, n)

This is the iconic block of code that is repeated in the SNA examples, below, with minor variations in the display of bindingSet values. To save virtual trees, we'll focus more tightly on the Prolog query from this point on:

    (select (?member ?name)
      (ego-group-member !lm:character11 1 associates ?member)
      (q ?member !dc:title ?name))

In this example, ego-group-member is an AllegroGraph SNA function that has been adapted for use in Prolog queries. There is a list of such functions on the AllegroGraph documentation reference page.

The query will execute ego-group-member, using Valjean (character11) as the starting point, following the predicates described in "associates," to a depth of 1 link. It binds each matching node to ?member. Then, for each binding of ?member, the query looks for the member's dc:title triple, and binds the member's ?name. The query returns multiple results, where each result is a (?member ?name) pair. The result object is passed back to Python, where we can iterate over the results and print out their values.

This is the output of the example:

Valjean's ego group (using associates).
Found 8 query results
<http://www.franz.com/lesmis#character27> "Javert"
<http://www.franz.com/lesmis#character24> "MmeThenardier"
<http://www.franz.com/lesmis#character25> "Thenardier"
<http://www.franz.com/lesmis#character28> "Fauchelevent"
<http://www.franz.com/lesmis#character11> "Valjean"
<http://www.franz.com/lesmis#character26> "Cosette"
<http://www.franz.com/lesmis#character23> "Fantine"
<http://www.franz.com/lesmis#character55> "Marius"

If you compare this list with the Gruff-generated image of Valjean's ego group, you'll see that AllegroGraph has found all eight expected nodes. You might be surprised that Valjean is regarded as a member of his own ego group, but that is a logical result of the definition of "ego group." The ego group is the set of all nodes within a certain depth of the starting point, and certainly the starting point must be is a member of that set.

We can perform the same search using a neighbor matrix, simply by substituting "matrix2" for "associates" in the query:

    (select (?member ?name)
      (ego-group-member !lm:character11 1 matrix2 ?member)
      (q ?member !dc:title ?name))

This produces the same set of result nodes, but under the right circumstances the matrix would run a lot faster than the generator.

This variation returns Valjean's ego group as a single list.

    (select (?group)
      (ego-group !lm:character11 1 associates ?group))

The result in the Python interaction window is: 

Valjean's ego group in one list depth 1 (using associates).
Found 1 query results
[ <http://www.franz.com/lesmis#character27> <http://www.franz.com/lesmis#character25>
<http://www.franz.com/lesmis#character28> <http://www.franz.com/lesmis#character23>
<http://www.franz.com/lesmis#character26> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character24> ]

In the following examples, we explore the graph for the shortest path from Valjean to Bossuet, using the three generators to place restrictions on the quality of the path. These are the relevant paths between these two characters:

 

Our first query asks AllegroGraph to use intimates to find the shortest possible path between Valjean and Bossuet that is composed entirely of "knows_well" links. Those would be the green arrows in the diagram above. The breadth-first-search-path function asks for a start node and an end node, a generator, an optional maximum path length, and a variable to bind to the resulting path(s). Valjean is character11, and Bossuet is character64.

    (select (?path)
      (breadth-first-search-path !lm:character11 !lm:character64 intimates 10 ?path))

It is easy to examine the diagram and see that there is no such path. Valjean and Bossuet are not well-acquainted, and do not have any chain of well-acquainted mutual friends. AllegroGraph lets us know that.

Shortest breadth-first path connecting Valjean to Bossuet using intimates.
Found 1 query results
[ ]

It says there is one result, but that is actually the empty list.

This time we'll broaden the criteria. What is the shortest path from Valjean to Bossuet, using associates? We can follow either "knows_well" or "knows" links across the graph. Those are the green and the blue links in the diagram.

(select (?path)
      (breadth-first-search-path !lm:character11 !lm:character64 associates 10 ?path))

Although there are multiple such paths, there are only two that are "shortest" paths.

Shortest breadth-first path connecting Valjean to Bossuet using associates.
Found 2 query results
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character58> <http://www.franz.com/lesmis#character64> ]
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character62> <http://www.franz.com/lesmis#character64> ]

These are the paths "Valjean > Marius > Enjolras > Bossuet" and "Valjean > Marius > Courfeyrac > Bossuet." AllegroGraph returns two paths because they are of equal length. If one of the paths had been shorter, it would have returned only the short path.

Our third query asks for the shortest path from Valjean to Bossuet using everyone, which means that "barely-knows" links are permitted in addition to "knows" and "knows_well" links.

    (select (?path)
      (breadth-first-search-path !lm:character11 !lm:character64 everyone 10 ?path))

This time AllegroGraph returns a single, short path:

Shortest breadth-first path connecting Valjean to Bossuet using everyone.
Found 1 query results
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character64> ]

This is the "barely-knows" link directly from from Valjean to Bossuet.

The Prolog select query can also use depth-first-search-path() and bidirectional-search-path(). Their syntax is essentially identical to that shown above. These algorithms offer different efficiencies:

In addition, the depth-first algorithm uses less memory than the others, so a depth-first search may succeed when a breadth-first search would run out of memory.

Graph Measures

AllegroGraph provides several utility functions that measure the characteristics of a node, such as the number of connections it has to other nodes, and its importance as a communication path in a clique.

For instance, we can use the nodal-degree function to ask how many nodal neighbors Valjean has, using everyone to catalog all the nodes connected to Valjean by "knows," "barely_knows", and "knows_well" predicates. There are quite a few of them:

The nodal-degree function requires the URI of the target node (Valjean is character11), the generator, and a variable to bind the returned value to.

    print "\nHow many neighbors are around Valjean? (should be 36)."
queryString = """
(select ?neighbors
(nodal-degree !lm:character11 everyone ?neighbors))
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
for bindingSet in result:
p = bindingSet.getValue("neighbors")
print "%s" % (p)
print "%s" % p.toPython()

Note that this function returns a string that describes an integer, which in its raw form is difficult for Python to use. We convert the raw value to a Python integer using the .toPython() method that is available to all literal values in the Python Sesame API to AllegroGraph. This example prints out both the string value and the converted number.

How many neighbors are around Valjean? (should be 36).
"36"^^<http://www.w3.org/2001/XMLSchema#integer>
36

If you want to see the names of these neighbors, you can use either the ego-group-member function described earlier on this page, or the nodal-neighbors function shown below:

    print "\nWho are Valjean's neighbors? (using everyone)."
queryString = """
(select (?name)
(nodal-neighbors !lm:character11 everyone ?member)
(q ?member !dc:title ?name))
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
count = 0
for bindingSet in result:
count = count + 1
n = bindingSet.getValue("name")
print "%s. %s " % (count,n.toPython())

This example enumerates all immediate neighbors of Valjean and returns their names. This example prints a numbered list with 36 entries:

Who are Valjean's neighbors? (using everyone).
1. Isabeau
2. Fantine
3. Labarre
4. Bossuet
5. Brevet ...

Another descriptive statistic is graph-density, which measures the density of connections within a subgraph.

For instance, this is Valjean's ego group with all associates included.

Only 9 of 28 possible links are in place in this subgraph, so the graph density is 0.32. The following query asks AllegroGraph to calculate this figure for Valjean's ego group:

    (select ?density
      (ego-group !lm:character11 1 associates ?group)
      (graph-density ?group associates ?density))

We used the ego-group function to return a list of Valjean's ego-group members, bound to the variable ?group, and then we used ?group to feed that subgraph to the graph-density function. The return value, ?density, came back as a string describing a float, and had to be converted to a Python float using .toPython().

Graph density of Valjean's ego group? (using associates).
"3.2142857e-1"^^<http://www.w3.org/2001/XMLSchema#double>
3.2142857e-1

Cliques

A "clique" is a subgraph where every node is connected to every other node by predicates specified in some generator. AllegroGraph, using everyone ("knows," "knows_well," and "barely_knows"), found that Valjean participates in 239 cliques!

It is counterintuitive that a "clique" should be composed mainly of people who "barely_know" each other, so let's try the same experiment using "associates," which restricts the cliques to people Valjean "knows" or "knows_well." In this case, AllegroGraph returns two cliques. One contains Valjean, Cosette, and Marius. The other contains Valjean and the Thenardiers.

This is the query that finds Valjean's "associates" cliques:

    (select (?clique)
      (clique !lm:character11 associates ?clique))

AllegroGraph returns two cliques:

Valjean's cliques? Should be two (using associates).
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character25>
<http://www.franz.com/lesmis#character24> ]
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character26> ]

The first list is the clique with Marius and Cosette. The second one represents the Thernardier clique.

Actor Centrality

AllegroGraph lets us measure the relative importance of a node in a subgraph using the actor-degree-centrality() function. For instance, it should be obvious that Valjean is very "central" to his own ego group (depth of one link), because he is linked directly to all other links in the subgraph. In that case he is linked to 7 of 7 possible nodes, and his actor-degree-centrality value is 7/7 = 1.

However, we can regenerate Valjean's ego group using a depth of 2. This adds three nodes that are not directly connected to Valjean. How "central" is he then?

In this subgraph, Valjean's actor-degree-centrality is 0.70, meaning that he is connected to 70% of the nodes in the subgraph.

This example asks AllegroGraph to generate the expanded ego group, and then to measure Valjean's actor-degree-centrality:

    (select (?centrality)
      (ego-group !lm:character11 2 associates ?group)
      (actor-degree-centrality !lm:character11 ?group associates ?centrality))

Note that we asked ego-group() to explore to a depth of two links, and then fed its result (?group) to actor-degree-centrality(). This is the output:

Valjean's centrality to his ego group at depth 2 (using associates).
"7.0e-1"^^<http://www.w3.org/2001/XMLSchema#double>
0.7

This confirms our expectation that Valjean's actor-degree-centrality should be 0.70 in this circumstance.

We can also measure actor centrality by calculating the average path length from a given node to the other nodes of the subgraph. This is called actor-closeness-centrality. For instance, we can calculate the average path length from Valjean to the ten nodes of his ego group (using associates and depth 2). Then we take the inverse of the average, so that bigger values will be "more central."

The actor-closeness-centrality for Valjean is 0.769. For Marius it is 0.60, showing that Valjean is more central and important to the group than is Marius.

This example calculates Valjean's actor-closeness-centrality for the associates ego group of depth 2.

    (select (?centrality)
      (ego-group !lm:character11 2 associates ?group)
      (actor-closeness-centrality !lm:character11 ?group associates ?centrality))
Valjean's actor-closeness-centrality to his ego group at depth 2 (using associates).
"7.692308e-1"^^<http://www.w3.org/2001/XMLSchema#double>
0.7692308

That is the expected value of 0.769.

Another approach to centrality is to count the number of information paths that are "controlled" by a specific node. This is called actor-betweenness-centrality. For instance, there are 45 possible "shortest paths" between pairs of nodes in Valjean's associates depth-2 ego group. Valjean can act as an information valve, potentially cutting off communication on 34 of these 45 paths. Therefore, he controls 75% of the communication in the group.

This example calculates Valjean's actor-betweenness-centrality:

    (select (?centrality)
      (ego-group !lm:character11 2 associates ?group)
      (actor-betweenness-centrality !lm:character11 ?group associates ?centrality))
Valjean's actor-betweenness-centrality to his ego group at depth 2 (using associates).
"7.5555557e-1"^^<http://www.w3.org/2001/XMLSchema#double>
7.5555557

That's the expected result of 0.755.

Group Centrality

Group-centrality measures express the "cohesiveness" of a group. There are three group-centrality measures in AllegroGraph: group-degree-centrality(), group-closeness-centrality(), and group-betweenness-centrality().

To demonstrate these measures, we'll use Valjean's ego group, first at radius 1 and then at radius 2. As you recall, the smaller ego group is radially symmetrical, but the larger one is quite lop-sided. That makes the smaller group "more cohesive" than the larger one.

Group-degree-centrality() measures group cohesion by finding the maximum actor centrality in the group, summing the difference between this and each other actor's degree centrality, and then normalizing. It ranges from 0 (when all actors have equal degree) to 1 (when one actor is connected to every other and no other actors have connections.

The prolog query takes this form:

 (select (?centrality)
   (ego-group !lm:character11 1 associates ?group)
   (group-degree-centrality ?group associates ?centrality))

The query first generates Valjean's (character11) ego group at radius 1, and binds that list of characters to ?group. Then it calls group-degree-centrality() on the group and returns the answer as ?centrality.

The group-degree-centrality for Valjean's radius-1 ego group is 0.129. When we expand to radius 2, the group-degree-centrality drops to 0.056. The larger group is less cohesive than the smaller one.

The following examples were all generated from queries that strongly resemble the one above.

Group-closeness-centrality() is measured by first finding the actor whose `closeness-centrality`
is maximized and then summing the difference between this maximum value and the actor-closeness-centrality of all other actors. This value is then normalized so that it ranges between 0 and 1.

The group-closeness-centrality of Valjean's smaller ego group is 0.073. The expanded ego group has a group-closeness-centrality of 0.032. Again, the smaller group is more cohesive.

Group-betweenness-centrality() is measured by first finding the actor whose actor-betweenness-centrality
is maximized and then summing the difference between this maximum value and the actor-betweenness-centrality of all other actors. This value is then normalized so that it ranges between 0 and 1.

Valjean's smaller ego group has a group-betweenness-centrality of 0.904. The value for the larger ego group is 0.704. Even by this measure, the larger group is less cohesive.

Transaction Control (example22)   Return to Top

Most of the time, multiple AllegroGraph users pool triples in a common session. Their data may be segregated into multiple contexts (subgraphs) for convenience, but in fact any part of this data may be retrieved by any of the users.

There are situations where a little more security is required. You may open a "dedicated session" and populate it with triples that the other users cannot view. These triples may be searched and manipulated in the usual ways, but do not become visible to other users until you "commit" them to the common session.

In order to use transaction semantics, the user account must have "start sessions" privileges with AllegroGraph Server. This is an elevated level of privilege. AllegroGraph users are profiled through the WebView interface.

To experiment with transaction semantics in AllegroGraph, we will need two connections to the triple store. In the "transaction connection" we will load, rollback, reload and commit incoming triples.  In the "autocommit connection" we will run queries against the resulting triple store, where the resources are always in a known and complete state.

In practice, transactions require only one connection.  We create a special connection for transaction behavior, use it, and close it.

"Commit" means to make the triples in the dedicated session visible in the common session. At the time of the commit, triples in the common session are also made visible to queries in the dedicated session. The two sessions are "synched up" by the commit.

"Rollback" means to discard the contents of the dedicated session. It also makes all common-session triples visible in the dedicated session.

"Closing" the dedicated session deletes all uncommitted triples, and all rules, generators and matrices that were created in the dedicated session. Rules, generators and matrices cannot be committed. They persist as long as the dedicated session persists, and are deleted when the dedicated session is closed.

Example example22() performs some simple data manipulations on a dedicated session to demonstrate the rollback and commit features. It begins by creating a connection to a repository (the common session), and a second connection to a dedicated session. To do that, all you need is a second connection object, and invoke its openSession() method.

    server = AllegroGraphServer(AG_HOST, AG_PORT, 'test', 'xyzzy')
catalog = server.openCatalog('scratch')
myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW)
myRepository.initialize()
common = myRepository.getConnection()
dedicated = myRepository.getConnection()
dedicated.openSession() # open dedicated session

We'll reuse the Kennedy and Les Miserables data. The Les Miserables data goes in the common session, and the Kennedy data goes in the dedicated session.

    path1 = "./kennedy.ntriples"
path2 = "./lesmis.rdf"
baseURI = "http://example.org/example/local"
## read kennedy triples into the dedicated session:
print "Load 1214 kennedy.ntriples into dedicated session."
dedicated.add(path1, base=baseURI, format=RDFFormat.NTRIPLES, contexts=None)
## read lesmis triples into the common session:
print "Load 916 lesmis triples into the common session."
common.addFile(path2, baseURI, format=RDFFormat.RDFXML, context=None);

The two sessions should now have independent content. When we look in the common session we should see only Les Miserables triples. The dedicated session could contain only Kennedy triples. We set up a series of simple tests similar to this one:

    print "\nUsing getStatements() on common session; should find Valjean:"
    Valjean = common.createLiteral("Valjean")
    statements = common.getStatements(None, None, Valjean, 'null', limit=1)
    print "Number of results: %s" % len(statements)
    for s in statements:
    print s

This test looks for our friend Valjean in the common session. He should be there. This is the output:

Using getStatements() on common session; should find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

However, there should not be anyone in the common session named "Kennedy." The code of the test is almost identical to that shown above, so we'll skip straight to the output.

Using getStatements() on common session; should not find Kennedy:
   Number of results: 0

There should be a Kennedy (at least one) visible in the dedicated session:

Using getStatements() on dedicated session; should find Kennedys:
   Number of results: 1
 (<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#last-name>, "Kennedy") 

And finally, we should not see Valjean in the dedicated session:

Using getStatements() on dedicated session; should not find Valjean:
   Number of results: 0  

The next step in the demonstration is to roll back the data in the dedicated session. This will make the Kennedy data disappear. It will also make the Les Miserables data visible in both sessions. We'll perform the same four tests, with slightly different expectations.

First we roll back the transaction:

    dedicated.rollback() 

Valjean is still visible in the common session:

Using getStatements() on common session; should find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

There are still no Kennedys in the common session:

Using getStatements() on common session; should find Kennedys:
Number of results: 0

There should be no Kennedys visible in the dedicated session:

Using getStatements() on dedicated session; should find Kennedys:
   Number of results: 0

And finally, we should suddenly see Valjean in the dedicated session:

Using getStatements() on dedicated session; should not find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

The rollback has succeeded in deleting the uncommitted triples from the dedicated session. It has also refreshed or resynched the dedicated session with the common session.

To set up the next test, we have to reload the Kennedy triples. Then we'll perform a commit.

    print "\nReload 1214 kennedy.ntriples into dedicated session."
dedicated.add(path1, base=baseURI, format=RDFFormat.NTRIPLES, contexts=None)
dedicated.commit()

This should make both types of triples visible in both sessions. Here are the four tests:

Using getStatements() on common session; should find Valjean:
   Number of results: 1
 (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")
Using getStatements() on common session; should find Kennedys:
   Number of results: 1
   (<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#last-name>, "Kennedy")
Using getStatements() on dedicated session; should find Kennedys:
   Number of results: 1
   (<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#last-name>, "Kennedy")
Using getStatements() on dedicated session; should find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

The Les Miserables triples are visible in both sessions. So too are the Kennedy triples.

Remember that when you are done with a session or connection, it is appropriate to close it:

    dedicated.closeSession()
dedicated.close()
common.close()

This recovers resources that would otherwise remain bound.

Duplicate Triples (example23)    Return to Top

Most people find it annoying when a query returns multiple copies of the same information. This can happen in mulitple ways, and there are multiple strategies for reducing or eliminating the problem. There are two broad strategies to pursue:

Tighten Up Your Queries

Triple patterns sometimes act in unexpected ways, resulting in "too many" matches.

Example23() revives the Kennedy family tree. It loads the kennedy.ntriples file, resulting in 1214 triples.

This example focuses on the three children of Ted Kennedy (person17). A simple getStatements() query shows us that all three are, in fact, present. They are person72, person74, and person76.

Using getStatements() find children of Ted Kennedy: three children.
Number of results: 3
(<http://www.franz.com/simple#person17>, <http://www.franz.com/simple#has-child>, <http://www.franz.com/simple#person72>)
(<http://www.franz.com/simple#person17>, <http://www.franz.com/simple#has-child>, <http://www.franz.com/simple#person74>)
(<http://www.franz.com/simple#person17>, <http://www.franz.com/simple#has-child>, <http://www.franz.com/simple#person76>)

Let's imagine that for some bureaucratic or legal reason we need to retrieve any two of the senator's three children from the triple store. We might begin with a SPARQL query like this one:

    SELECT ?o1 ?o2 
WHERE {kdy:person17 kdy:has-child ?o1 .
kdy:person17 kdy:has-child ?o2 .}

Since there are only three ways to retrieve a pair from a pool of three, we might be startled to get nine answers:

SPARQL matches for two children of Ted Kennedy, inept pattern.
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person76> and <http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person76> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person76> and <http://www.franz.com/simple#person76>

Three of these matches involve the same triple matching both patterns, claiming that each person is a sibling of himself. The other six are permutations of the correct answers, in which we first discover that person72 is a sibling of person74, and then subsequently discover that person74 is a sibling of person72. We didn't need to receive that information twice.

Let's eliminate the useless duplications with a simple trick. Here it is in SPARQL:

    SELECT ?o1 ?o2 
WHERE {kdy:person17 kdy:has-child ?o1 .
kdy:person17 kdy:has-child ?o2 .
filter (?o1 < ?o2)}

And this is the equivalent query in Prolog. Note that lispp is correct, not a typo:

    (select (?o1 ?o2)
(q !kdy:person17 !kdy:has-child ?o1)
(q !kdy:person17 !kdy:has-child ?o2)
(lispp (upi< ?o1 ?o2)))

The inequality insures that the two object variables cannot be bound to the same value, eliminating the sibling-of-self issue, and also that only one of the persons can be bound to ?o1, which eliminates the duplication issue. Now we have the expected three results:

SPARQL matches for two children of Ted Kennedy, better pattern.
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person76>

Our task, however, is to pluck "any two" of the children from the triple store and then stop. That's easy in SPARQL:

    SELECT ?o1 ?o2 
WHERE {kdy:person17 kdy:has-child ?o1 .
kdy:person17 kdy:has-child ?o2 .
filter (?o1 < ?o2)}
LIMIT 1

This query returns one result that is guaranteed to reference two different children.

The moral of this example is that triple patterns often return more information than you expected, and need to be rewritten to make them less exuberant. You may be able to reduce the incidence of duplicate results simply by writing better queries.

Tighten Up Your RDF

The Resource Description Framework (RDF) has many fine qualities, but there are also a few RDF features that create headaches for users. One of these is the ability to encode resources as "blank nodes." This is a convenience to the person who is generating the data, but it often creates problems for the data consumer.

Here's a simple example. These two RDF resource descriptions represent two books by the same author. The author is described using a "blank node" within the book resource description:

  <rdf:Description rdf:about="http://www.franz.com/tutorial#book1">
<dc:title>The Call of the Wild</dc:title>
<dc:creator>
<foaf:Person foaf:name="Jack London"/>
</dc:creator>

</rdf:Description>

<rdf:Description rdf:about="http://www.franz.com/tutorial#book2">
<dc:title>White Fang</dc:title>
<dc:creator>
<foaf:Person foaf:name="Jack London"/>
</dc:creator>

</rdf:Description>

This is a common RDF format used by programs that transcribe databases into RDF. In this format, there is no resource URI designated for the foaf:Person resource. Therefore, the RDF parser will generate a URI to use as the subject value of the new resource. The problem is that it generates a new URI each time it encounters an embedded resource. We wind up with eight triples and two different "Jack London" resources:

Two books, with one author as blank node in each book.
Number of results: 8

(<http://www.franz.com/tutorial#book1>, <http://purl.org/dc/elements/1.1/title>, "The Call of the Wild")
(<http://www.franz.com/tutorial#book1>, <http://purl.org/dc/elements/1.1/creator>, _:bE26D2A88x1)

(<http://www.franz.com/tutorial#book2>, <http://purl.org/dc/elements/1.1/title>, "White Fang")
(<http://www.franz.com/tutorial#book2>, <http://purl.org/dc/elements/1.1/creator>, _:bE26D2A88x2)

(_:bE26D2A88x1, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)
(_:bE26D2A88x1, <http://xmlns.com/foaf/0.1/name>, "Jack London")

(_:bE26D2A88x2, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)
(_:bE26D2A88x2, <http://xmlns.com/foaf/0.1/name>, "Jack London")

You can see the two books, each with a title and a creator. Then there are two "anonymous" resources with generated subject URIs, both of which represent the same author. This is very undesirable.

Duplicate resources are difficult to remove or fix. They have different subject values, which mean their triples are technically not duplicates of each other. The author resources are directly linked to book resources, one-to-one, meaning that you can't just delete the extra authors. That would leave book resources linked to authors that don't exist anymore.

Clearly we don't want to be in this situation. We have to back out and generate the RDF a different way.

We might decide to fix this problem by rewriting the resource descriptions. This time we'll be sure that the formerly anonymous nodes are set up with URIs. This way we'll get only one author resource. This format is what RDF calls "striped syntax."

  <rdf:Description rdf:about="http://www.franz.com/tutorial#book1">
<dc:title>The Call of the Wild</dc:title>
<dc:creator>
<foaf:Person rdf:about="#Jack">
<foaf:name>Jack London</foaf:name>
</foaf:Person>

</dc:creator>
</rdf:Description>

<rdf:Description rdf:about="http://www.franz.com/tutorial#book2">
<dc:title>White Fang</dc:title>
<dc:creator>
<foaf:Person rdf:about="#Jack">
<foaf:name>Jack London</foaf:name>
</foaf:Person>

</dc:creator>
</rdf:Description>

Even though we have embedded two resource descriptions, we gave them both the same URI ("#Jack"). Did this solve the problem? We now have only one "Jack London" resource, but we still have eight triples!

Two books, with one author identified by URI but in striped syntax in each book.
Number of results: 8

(<http://www.franz.com/tutorial#book1>, <http://purl.org/dc/elements/1.1/title>, "The Call of the Wild")
(<http://www.franz.com/tutorial#book1>, <http://purl.org/dc/elements/1.1/creator>, <http://www.franz.com/simple#Jack>)
(<http://www.franz.com/tutorial#book2>, <http://purl.org/dc/elements/1.1/title>, "White Fang")
(<http://www.franz.com/tutorial#book2>, <http://purl.org/dc/elements/1.1/creator>, <http://www.franz.com/simple#Jack>)

(<http://www.franz.com/simple#Jack>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>) (<http://www.franz.com/simple#Jack>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)
(<http://www.franz.com/simple#Jack>, <http://xmlns.com/foaf/0.1/name>, "Jack London")
(<http://www.franz.com/simple#Jack>, <http://xmlns.com/foaf/0.1/name>, "Jack London")

We have only one "Jack London" author resource, which means we will have a more useful graph than before, but every triple in that resource is duplicated. If we had cataloged fifty books by Jack London, there would be fifty copies of each of these triples. The "striped syntax" generates floods of duplicate triples.

It is much better to avoid blank nodes and striped syntax altogether. Here are the same two books, using a separate author resource that is linked to the books by a URI:

  <rdf:Description rdf:about="http://www.franz.com/tutorial#book5">
<dc:title>The Call of the Wild</dc:title>
<dc:creator rdf:resource="http://www.franz.com/tutorial#author1"/>
</rdf:Description>

<rdf:Description rdf:about="http://www.franz.com/tutorial#book6">
<dc:title>White Fang</dc:title>
<dc:creator rdf:resource="http://www.franz.com/tutorial#author1"/>
</rdf:Description>

<rdf:Description rdf:about="http://www.franz.com/tutorial#author1">
<dc:title>Jack London</dc:title>
</rdf:Description>

This is arguably the "best" syntax to follow because one author resource is directly connected to all of that author's books. The graph is rich in nodes and connections, while avoiding duplicate triples. This example creates six triples, none of which are duplicates:

Two books, with one author linked by a URI.
Number of results: 6

(<http://www.franz.com/tutorial#book5>, <http://purl.org/dc/elements/1.1/title>, "The Call of the Wild")
(<http://www.franz.com/tutorial#book5>, <http://purl.org/dc/elements/1.1/creator>, <http://www.franz.com/tutorial#author1>)
(<http://www.franz.com/tutorial#book6>, <http://purl.org/dc/elements/1.1/title>, "White Fang")
(<http://www.franz.com/tutorial#book6>, <http://purl.org/dc/elements/1.1/creator>, <http://www.franz.com/tutorial#author1>)

(<http://www.franz.com/tutorial#author1>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)
(<http://www.franz.com/tutorial#author1>, <http://purl.org/dc/elements/1.1/title>, "Jack London")

There is only one way to simplify the example from here. Perhaps you really don't need author resources at all. It could be sufficient to know the name of a book's author and never reify the author as a node in the graph. In that case, you can pare down the resource descriptions by including the author name as a literal string:

  <rdf:Description rdf:about="http://www.franz.com/tutorial#book3">
<dc:title>The Call of the Wild</dc:title>
<dc:creator>Jack London</dc:creator>
</rdf:Description>

<rdf:Description rdf:about="http://www.franz.com/tutorial#book4">
<dc:title>White Fang</dc:title>
<dc:creator>Jack London</dc:creator>
</rdf:Description>

This example generates only four triples, none of which are duplicates. We have two books, and no author resource.

Two books, with one author as a literal value.
Number of results: 4
(<http://www.franz.com/tutorial#book3>, <http://purl.org/dc/elements/1.1/title>, "The Call of the Wild")
(<http://www.franz.com/tutorial#book3>, <http://purl.org/dc/elements/1.1/creator>, "Jack London")

(<http://www.franz.com/tutorial#book4>, <http://purl.org/dc/elements/1.1/title>, "White Fang")
(<http://www.franz.com/tutorial#book4>, <http://purl.org/dc/elements/1.1/creator>, "Jack London")

The lesson here is to keep the representation simple. The fewer resources there are, the faster everything will work.

Don't Create Duplicate Triples

Now we'll consider the situation where we have multiple copies of the same information in the triple store. This is the classic "duplicate triples" situation.

True duplicate triples do not occur by accident. They occur only when you have loaded the same information into the triple store more than once.

This is easy to demonstrate. Let's load the Kennedy graph into an empty triple store:

Load 1214 kennedy.ntriples.
After loading, there are:
  1214 kennedy triples in context 'null';

Then just load the same file again:

Reload 1214 kennedy.ntriples.
After loading, there are:
  2428 kennedy triples in context 'null';

Now there are two copies of every Kennedy triple. If you add the same triple multiple times, you will get multiple copies of it.

Filter Out Duplicate Results

In practice, our advice "don't create duplicate triples" may be difficult to follow. Some data feeds contain duplicated information and there is no convenient or efficient way to filter it out. So now you have duplicated data in the system. How much trouble might that cause?

This simple query should return three matches, one for each of Ted Kennedy's three children:

    SELECT ?o WHERE {kdy:person17 kdy:has-child ?o}

However, because each of the expected triples has a duplicate, we get six answers instead of three:

SPARQL matches for children of Ted Kennedy.
<http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person76>

That's a nuisance, but SPARQL provides an easy way to wipe out the duplicate answers:

    SELECT DISTINCT ?o WHERE {kdy:person17 kdy:has-child ?o}

The DISTINCT operator does not remove duplicate triples, but it detects and eliminates duplicate variable bindings in the query's output. The duplicate triples are still there in the triple store, but we don't see them in the output:

SPARQL DISTINCT matches for children of Ted Kennedy.
<http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person74>

DISTINCT works by sorting the bindings by each of the bound variables in the result. This forces duplicate results to be adjacent to each other in the result list. Then it runs over the list and eliminates bindings that are the same as the previous binding. As you can imagine, there are situations where forcing an exhaustive sort on a large binding set might use up a lot of time.

For this reason, SPARQL also offers a REDUCED operator:

    SELECT REDUCED ?o WHERE {kdy:person17 kdy:has-child ?o} ORDER BY ?o

REDUCED performs the same sweep for duplicates that DISTINCT performs, but it lets you control the sorting. In this example the values of the subject and predicate are fixed, so sorting by the object value is sufficient to force all duplicate bindings into contact with each other. REDUCED is often much faster than DISTINCT.

Filtering Duplicate Triples while Loading

AllegroGraph could check each incoming triple to see if that information is already present in the triple store, but this process would be very time-consuming, slowing down our extremely-fast load times by orders of magnitude. Loading speed is important to almost everyone, so AllegroGraph does not filter duplicates during the load.

If loading speed isn't critical in your application, you can add triples one at a time while checking for duplicates as shown here:

    if conn.getStatements(newParent, hasChild, newChild):
print "Did not add new triple."
else:
conn.add(newParent, hasChild, newChild)
print "Added new triple."

If getStatements() return any value, then the new triple is already present in the store. If not, then we can add it to the store knowing that it will be unique.

When you run this example twice it adds a triple the first time, but not the second time:

Test before adding triple, first trial: 
Added new triple.  


Test before adding triple, second trial: 
Did not add new triple.

Duplicates in Federated Stores

A federated store is a single connection to multiple AllegroGraph repositories, possibly on multiple servers at multiple sites. Such systems can have duplicate triples, but deleting them is unwise.

Presumably any existing repository has a purpose of its own, and has been federated with other respositories to serve some combined purpose. It would be a mistake to delete triples from one repository simply because they also occur elsewhere in the federated store. This could damage the integrity of the individual repositories when they are used in stand-alone mode. By similar reasoning, it is a mistake to add triples to a federated repository. Which repository gets the new triples?

For all of these reasons, federated stores are "read only" in AllegroGraph. Attempting to add or delete triples in a federated system result in error messages.

Federation changes the semantics of "duplicate triples." Be sure to take this into account in your project design.