Python API Tutorial for AllegroGraph 4.0

This is a introduction to the Python client API to AllegroGraph RDFStore™ version 4.0 from Franz Inc.

The Python API offers convenient and efficient access to an AllegroGraph server from a Python-based application. This API provides methods for creating, querying and maintaining RDF data, and for managing the stored triples.

The Python API deliberately emulates the Aduma Sesame API to make it easier to migrate from Sesame to AllegroGraph.  The Python API has also been extended in ways that make it easier and more intuitive than the Sesame API. 

Contents

 

Overview   Return to Top

The Python client tutorial rests on a simple architecture involving AllegroGraph, disk-based data files, Python, and a file of Python examples called tutorial_examples_40.py.

AllegroGraph 4.0 Server runs as a Windows service in this example. It contains the Python API, which is part of the AllegroGraph installation.

Python communicates with AllegroGraph through HTTP port 8080 in this example. Python and AllegroGraph may be installed on the same computer, but in practice one server is shared by multiple clients.

Load tutorial_examples_40.py into Python to view the tutorial examples.

Each lesson in tutorial_examples_40.py is encapsulated in a Python function, named exampleN(), where N ranges from 0 to 21 (or more). The function names are referenced in the title of each section to make it easier to compare the tutorial text and the living code of the examples file.

Prerequisites (Linux)   Return to Top

The tutorial examples can be run on a Linux system, running AllegroGraph and the examples on the same computer ("localhost"). The tutorial assumes that AllegroGraph and Python 2.5 have been installed and configured using the procedure posted on this webpage.

Terminology   Return to Top

We need to clarify some terminology before proceeding.

In the context of AllegroGraph Server:

  • A "catalog" is a list of repositories owned by an AllegroGraph server.
  • A "repository" is a collection of triples within a Catalog, stored and indexed on a hard disk.
  • A "context" is a subgraph of the triples in a repository.
  • If contexts are not in use, the triples are stored in the background (default) graph.

Creating Users with WebView    Return to Top

Each connection to an AllegroGraph server runs under the credentials of a registered AllegroGraph user account.

Initial Superuser Account

The installation instructions for AllegroGraph advise you to create a default superuser called "test", with password "xyzzy". This is the user (and password) expected by the tutorial examples. If you created this account as directed, you can proceed to the next section and return to this topic at a later time when you need to create non-superuser accounts.

If you created a different superuser account you'll have to edit the tutorial_examples_40.py file before proceeding. Modify these entries near the top of the file:

AG_USER = 'test'
AG_PASSWORD = 'xyzzy'

Otherwise you'll get an authentication failure when you attempt to connect to the server.

Users, Permissions, Access Rules, and Roles

AllegroGraph user accounts may be given any combination of the following three permissions:

In addition, a user account may be given read, write or read/write access to individual repositories.

You can also define a role (such as "librarian") and give the role a set of permissions and access rules. Then you can assign several users to a shared role. This lets you manage their permissions and access by editing the role instead of the individual user accounts.

A superuser automatically has all possible permissions and unlimited access. A superuser can also create, manage and delete other user accounts. Non-superusers cannot view or edit account settings.

A user with the Start Sessions permission can use the AllegroGraph features that require spawning a dedicated session, such as Transactions and Social Network Analysis. If you try to use these features without the appropriate permission, you'll encounter authentication errors.

A user with permission to Evaluate Arbitrary Code can run Prolog Rule Queries. This user can also do anything else that allows executing Lisp code, such as defining select-style generators, or doing eval-in-server, as well as loading server-side files.

WebView

WebView is AllegroGraph's HTTP-based graphical user interface for user and repository management. To connect to WevView, simply direct your Web browser to the AllegroGraph port of your server. If you have installed AllegroGraph locally (and used the default port number), use:

http://localhost:10035

You will be asked to log in. Use the superuser credentials described in the previous section.

The first page of WebView is a summary of your catalogs, repositories, and federations. Click the user account link in the lower left corner of the page. This exposes the Users and Roles page.

This is the environment for creating and managing user accounts.

To create a new user, click the [add a user] link. This exposes a small form where you can enter the username (one symbol) and password. Click OK to save the new account.

The new user will appear in the list of users. Click the [view permissions] link to open a control panel for the new user account:

Use the checkboxes to apply permissions to this account (superuser, start session, evaluate arbitrary code).

It is imporant that you set up access permissions for the new user. Use the form to create an access rule by selecting read, write or read/write access, naming a catalog (or * for all), and naming a repository within that catalog (or * for all). Click the [add] link. This creates an access rule for your new user. The access rule will appear in the permissions display:

This new user can log in and perform transactions on any repository in the system.

To repeat, the "test" superuser is all you need to run all of the tutorial examples. This section is for the day when you want to issue more modest credentials to some of your operators.

Running Python Tutorial Examples    Return to Top

The AllegroGraph Python Tutorial examples are in the tutorial subdirectory where you unpacked the Python client tar.gz file. Navigate to that directory and follow one of these examples:

     $ python tutorial_examples_40.py            runs all tests.
     $ python tutorial_examples_40.py all        runs all tests.
     $ python tutorial_examples_40.py 10         runs example10()
     $ python tutorial_examples_40.py 1 5 22     runs examples 1, 5, and 22

Creating a Repository (example1())    Return to Top

The first task is to our AllegroGraph Server and open a repository. This task is implemented in example1() from tutorial_examples_40.py.

In example1() we build a chain of Python objects, ending in a"connection" object that lets us manipulate triples in a specific repository. The overall process of generating the connection object follows this diagram:

The example1() function opens (or creates) a repository by building a series of client-side objects, culminating in a "connection" object. The connection object will be passed to other functions in tutorial_examples_40.py.

The connection object contains the methods that let us manipulate triples in a specific repository.

The example first connects to an AllegroGraph Server by providing the endpoint (host IP address and port number) of an already-launched AllegroGraph server. This creates a client-side server object, which can access the AllegroGraph server's list of available catalogs through the listCatalogs() method:.

def example1(accessMode=Repository.RENEW):
    server = AllegroGraphServer(AG_HOST, AG_PORT, 'user', 'password')
    print "Available catalogs", server.listCatalogs()

This is the output so far:

>>> example1()
Defining connnection to AllegroGraph server -- host:'localhost' port:8080
Available catalogs ['/', 'scratch']

This output says that the server has two catalogs, the default rootCatalog, '/', and a named catalog 'scratch' that someone has created for some experimentation.

In the next line of example1(), we use the server.openCatalog() method to create a client-side catalog object. In this example we will connect to the default rootCatalog by calling openCatalog() with no arguments. When we look inside the root catalog, we can see which repositories are available:

    catalog = server.openCatalog() 
    print "Available repositories in catalog '%s': %s" % (catalog.getName(), catalog.listRepositories()) 

The corresponding output lists the available repositories. (When you run the examples, you may see a different list of repositories.)

Available repositories in catalog 'None': ['tutorial']  

The next step is to create a client-side repository object representing the respository we wish to open, by calling the getRepository() method of the catalog object. We have to provide the name of the desired repository (AG_REPOSITORY, bound to python-tutorial in this case), and select one of four access modes:

Repository.RENEW is the default setting for the example1() function of tutorial_examples_40.py. It can be overridden by calling example1() with the appropriate argument, such as example1(Repository.OPEN).

    myRepository = catalog.getRepository(AG_REPOSITORY, accessMode)
    myRepository.initialize()

A new or renewed repository must be initialized, using the initialize() method of the repository object. If you try to initialize a respository twice you get a warning message in the Python window but no exception.

The goal of all this object-building has been to create a client-side connection object, whose methods let us manipulate the triples of the repository. The repository object's getConnection() method returns this connection object.

    connection = myRepository.getConnection()
    print "Repository %s is up! It contains %i statements." % (
    myRepository.getDatabaseName(), connection.size())
    return connection

The size() method of the connection object returns how many triples are present. In the example1() function, this number should always be zero because we "renewed" the repository. This is the output in the Python window:

   Repository pythontutorial is up! It contains 0 statements.
   <franz.openrdf.repository.repositoryconnection.RepositoryConnection object at 0x0127D710>
 >>> 

The last line is the pointer to the new connection object. This is the value returned by example1() when it is called by other functions in tutorial_examples_40.py. The other functions then use the connection object to access the repository.

Asserting and Retracting Triples (example2())    Return to Top

In example example2(), we show how to create resources describing two people, Bob and Alice, by asserting individual triples into the respository. The example also retracts and replaces a triple. Assertions and retractions to the triple store are executed by 'add' and 'remove' methods belonging to the connection object, which we obtain by calling the example1() function (described above).

Before asserting a triple, we have to generate the URI values for the subject, predicate and object fields. The Python API to AllegroGraph Server predefines a number of classes and predicates for the RDF, RDFS, XSD, and OWL ontologies. RDF.TYPE is one of the predefined predicates we will use.

The 'add' and 'remove' methods take an optional 'contexts' argument that specifies one or more subgraphs that are the target of triple assertions and retractions. When the context is omitted, triples are asserted/retracted to/from the background graph. In the example below, facts about Alice and Bob reside in the background graph.

The example2() function begins by calling example1() to create the appropriate connection object, which is bound to the variable conn.

def example2():
    conn = example1()

The next step is to begin assembling the URIs we will need for the new triples. The createURI() method generates a URI from a string. These are the subject URIs identifying the resources "Bob" and "Alice":

    alice = conn.createURI("http://example.org/people/alice")
    bob = conn.createURI("http://example.org/people/bob")

Bob and Alice will be members of the "person" class (RDF:TYPE person).

    person = conn.createURI("http://example.org/ontology/Person")

Both Bob and Alice will have a "name" attribute.

    name = conn.createURI("http://example.org/ontology/name")

The name attributes will contain literal values. We have to generate the Literal objects from strings:

    bobsName = conn.createLiteral("Bob")
    alicesName = conn.createLiteral("Alice")

The next line prints out the number of triples currently in the repository.

    print "Triple count before inserts: ", conn.size()
Triple count before inserts:  0

Now we assert four triples, two for Bob and two more for Alice, using the connection object's add() method. After the assertions, we count triples again (there should be four) and print out the triples for inspection.

    ## alice is a person
    conn.add(alice, RDF.TYPE, person)
    ## alice's name is "Alice"
    conn.add(alice, name, alicesName)
    ## bob is a person
    conn.add(bob, RDF.TYPE, person)
    ## bob's name is "Bob":
    conn.add(bob, name, bobsName)

    print "Triple count: ", conn.size()
    for s in conn.getStatements(None, None, None, None): print s 

The "None" arguments to the getStatements() method say that we don't want to restrict what values may be present in the subject, predicate, object or context positions. Just print out all the triples.

This is the output at this point. We see four triples, two about Alice and two about Bob:

Triple count:  4
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>)
(<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice")
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>)
(<http://example.org/people/bob>, <http://example.org/ontology/name>, "Bob")

We see two resources of type "person," each with a literal name.

The next step is to demonstrate how to remove a triple. Use the remove() method of the connection object, and supply a triple pattern that matches the target triple. In this case we want to remove Bob's name triple from the repository. Then we'll count the triples again to verify that there are only three remaining. Finally, we re-assert Bob's name so we can use it in subsequent examples, and we'll return the connection object..

    conn.remove(bob, name, bobsName)
    print "Triple count: ", conn.size()
    conn.add(bob, name, bobsName)    
    return conn
Triple count:  3
<franz.openrdf.repository.repositoryconnection.RepositoryConnection object at 0x01466830>

A SPARQL Query (example3())    Return to Top

SPARQL stands for the "SPARQL Protocol and RDF Query Language," a recommendation of the World Wide Web Consortium (W3C). SPARQL is a query language for retrieving RDF triples.

Our next example illustrates how to evaluate a SPARQL query. This is the simplest query, the one that returns all triples. Note that example3() continues with the four triples created in example2().

def example3():    
    conn = example2()
    try:
        queryString = "SELECT ?s ?p ?o  WHERE {?s ?p ?o .}"

The SELECT clause returns the variables ?s, ?p and ?o in the bindingSet. The variables are bound to the subject, predicate and objects values of each triple that satisfies the WHERE clause. In this case the WHERE clause is unconstrained. The dot (.) in the fourth position signifies the end of the pattern.

The connection object's prepareTupleQuery() method creates a query object that can be evaluated one or more times. (A "tuple" is an ordered sequence of data elements in Python.) The results are returned in an iterator that yields a sequence of bindingSets.

        tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
        result = tupleQuery.evaluate();

Below we illustrate one (rather heavyweight) method for extracting the values from a binding set, indexed by the name of the corresponding column variable in the SELECT clause.

        try:
            for bindingSet in result:
                s = bindingSet.getValue("s")
                p = bindingSet.getValue("p")
                o = bindingSet.getValue("o")              
                print "%s %s %s" % (s, p, o)
http://example.org/people/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/ontology/Person
http://example.org/people/alice http://example.org/ontology/name "Alice"
http://example.org/people/bob http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/ontology/Person
http://example.org/people/bob http://example.org/ontology/name "Bob"

The Connection class is designed to be created for the duration of a sequence of updates and queries, and then closed. In practice, many AllegroGraph applications keep a connection open indefinitely. However, best practice dictates that the connection should be closed, as illustrated below. The same hygiene applies to the iterators that generate binding sets.

        finally:
            result.close();
    finally:
        conn.close();

Statement Matching (example4())    Return to Top

The getStatements() method of the connection object provides a simple way to perform unsophisticated queries. This method lets you enter a mix of required values and wildcards, and retrieve all matching triples. (If you need to perform sophisticated tests and comparisons you should use the SPARQL query instead.)

Below, we illustrate two kinds of 'getStatement' calls. The first mimics traditional Sesame syntax, and returns a Statement object at each iteration. This is the example4() function of tutorial_examples_40.py. It begins by calling example2() to create a connection object and populate the pythontutorial repository with four triples describing Bob and Alice. We're going to search for triples that mention Alice, so we have to create an "Alice" URI to use in the search pattern:

def example4():
conn = example2()
alice = conn.createURI("http://example.org/people/alice")

Now we search for triples with Alice's URI in the subject position. The "None" values are wildcards for the predicate and object positions of the triple.

    print "Searching for Alice using getStatements():"
    statements = conn.getStatements(alice, None, None)

The getStatements() method returns a repositoryResult object (bound to the variable "statements" in this case). This object can be iterated over, exposing one result statement at a time. It is sometimes desirable to screen the results for duplicates, using the enableDuplicateFilter() method. Note, however, that duplicate filtering can be expensive. Our example does not contain any duplicates, but it is possible for them to occur.

    statements.enableDuplicateFilter() 
for s in statements:
print s

This prints out the two matching triples for "Alice."

Searching for Alice using getStatements():
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>) (<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice")

At this point it is good form to close the respositoryResponse object because they occupy memory and are rarely reused in most programs.

	statements.close()

The example4() example continues with a second, more efficient way to perform simple retrievals. The second syntax borrows a trick from the JDBC API commonly used to access relational databases. The getJDBCStatements() method returns a ResultSet object that lets us iterate over the returned triples. A resultSet iterator does not materialize objects unless forced to. In this example it materializes only the object values of the returned triples. The getValue() method forces materialization of a resource or literal as an object, while the getString() call returns a string without creating an object. Developers who care about minimizing garbage will prefer to use the getJDBCStatements()' call, and they will usually call getString() in preference to getValue().

    print "Same thing using JDBC:"
resultSet = conn.getJDBCStatements(alice, None, None)
while resultSet.next():
print " ", resultSet.getValue(2), " ", resultSet.getString(2)

The output is:

Same thing using JDBC:
    <http://example.org/ontology/Person> http://example.org/ontology/Person
    "Alice"    Alice  

Literal Values (example5())   Return to Top

The next example, example5(), illustrates some variations on what we have seen so far. The example creates and asserts typed and plain literal values, including language-specific plain literals, and then conducts searches for them in three ways:

The getStatements() and SPARQL direct searches return exactly the datatype you ask for. The SPARQL filter queries can sometimes return multiple datatypes. This behavior will be one focus of this section.

If you are not explicit about the datatype of a value, either when asserting the triple or when writing a search pattern, AllegroGraph will deduce an appropriate datatype and use it. This is another focus of this section. This helpful behavior can sometimes surprise you with unanticipated results.

Setup

Example5() begins by obtaining a connection object from example1(), and then clears the repository of all existing triples.

def example5():
conn = example1()
conn.clear()

For sake of coding efficiency, it is good practice to create variables for namespace strings. We'll use this namespace again and again in the following lines. We have made the URIs in this example very short to keep the result displays compact.

    exns = "http://people/"

The example creates new resources describing seven people, named alphabetically from Alice to Greg. These are URIs to use in the subject field of the triples. The example shows how to enter a full URI string, or alternately how to combine a namespace with a local resource name.

    alice = conn.createURI("http://example.org/people/alice")
bob = conn.createURI("http://example.org/people/bob")
carol = conn.createURI("http://example.org/people/carol")
dave = conn.createURI(namespace=exns, localname="dave")
eric = conn.createURI(namespace=exns, localname="eric")
fred = conn.createURI(namespace=exns, localname="fred") greg = conn.createURI(namesapce=exns, localname="greg")

Numeric Literal Values

This section explores the behavior of numeric literals.

Asserting Numeric Data

The first section assigns ages to the participants, using a variety of numeric types. First we need a URI for the "age" predicate.

    age = conn.createURI(namespace=exns, localname="age")

The next step is to create a variety of values representing ages. Coincidentally, these people are all 42 years old, but we're going to record that information in multiple ways:

    fortyTwo = conn.createLiteral(42)             # creates long
    fortyTwoDecimal = conn.createLiteral(42.0)    # creates double
    fortyTwoInt = conn.createLiteral('42', datatype=XMLSchema.INT)
    fortyTwoLong = conn.createLiteral('42', datatype=XMLSchema.LONG) 
    fortyTwoFloat = conn.createLiteral('42', datatype=XMLSchema.FLOAT) 
    fortyTwoString = conn.createLiteral('42', datatype=XMLSchema.STRING)
    fortyTwoPlain = conn.createLiteral('42')      # creates plain literal

In four of these statements, we explicitly identified the datatype of the value in order to create an INT, a LONG, a FLOAT and a STRING. This is the best practice.

In three other statements, we just handed AllegroGraph numeric-looking values to see what it would do with them. As we will see in a moment, 42 creates a LONG, 42.0 becomes into a DOUBLE, and '42' becomes a "plain" (untyped) literal value. (Note that plain literals are not quite the same thing as typed literal strings. A search for a plain literal will not always match a typed string, and vice versa.)

Now we need to assemble the URIs and values into statements (which are client-side triples):

    stmt1 = conn.createStatement(alice, age, fortyTwo)
    stmt2 = conn.createStatement(bob, age, fortyTwoDecimal)    
    stmt3 = conn.createStatement(carol, age, fortyTwoInt)    
    stmt4 = conn.createStatement(dave, age, fortyTwoLong)    
    stmt5 = conn.createStatement(eric, age, fortyTwoFloat)
    stmt6 = conn.createStatement(fred, age, fortyTwoString)  
    stmt7 = conn.createStatement(greg, age, fortyTwoPlain)

And then add the statements to the triple store on the AllegroGraph server. We can use either add() or addStatement() for this purpose.

    conn.add(stmt1)
    conn.add(stmt2)
    conn.add(stmt3)
    conn.addStatement(stmt4)
    conn.addStatement(stmt5)
    conn.addStatement(stmt6)
    conn.addStatement(stmt7)

Now we'll complete the round trip to see what triples we get back from these assertions. This is how we use getStatements() in this example to retrieve and display triples for us:

    print "\nShowing all age triples using getStatements(). Seven matches."
    statements = conn.getStatements(None, age, None)
    for s in statements:
        print s

This loop prints all age triples to the interaction window. Note that the retrieved triples are of six types: two ints, a long, a float, a double, a string, and a "plain literal." All of them say that their person's age is 42. Note that the triple for Greg has the plain literal value "42", while the triple for Fred uses "42" as a typed string.

Showing all age triples using getStatements(). Seven matches.
(<http://people/greg>, <http://people/age>, "42")
(<http://people/fred>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/eric>, <http://people/age>, "4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>)
(<http://people/dave>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)
(<http://people/carol>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#int>)
(<http://people/bob>, <http://people/age>, "4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>)
(<http://people/alice>, <http://people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#long>)

If you ask for a specific datatype, you will get it. If you leave the decision up to AllegroGraph, you might get something unexpected such as an plain literal value.

Matching Numeric Data

This section explores getStatements() and SPARQL matches against numeric triples.

In the first example, we asked AllegroGraph to find an untyped number, 42.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, 42) "42"^^<http://www.w3.org/2001/XMLSchema#long>
SPARQL direct match SELECT ?s ?p WHERE {?s ?p 42 .} No direct matches.
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = 42)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

The getStatements() query returned triples containing longs only. The SPARQL direct match didn't know how to interpret the untyped value and found zero matches. The SPARQL filter match, however, opened the doors to matches of multiple numeric types, and returned ints, floats, longs and doubles.

"Match 42.0" without explicitly declaring the number's type.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, 42.0)

"42"^^<http://www.w3.org/2001/XMLSchema#double>
Matches a double but not a float.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p 42.0 .} No direct matches.
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = 42.0)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

The getStatements() search returned a double but not the similar float. The filter match returned all numeric types that were equal to 42.0.

"Match '42'^^<http://www.w3.org/2001/XMLSchema#int>." Note that we have to use a variable (fortyTwoInt) bound to a Literal value in order to offer this int to getStatements(). We can't just type the value into the getStatements() method directly.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoInt)

"42"^^<http://www.w3.org/2001/XMLSchema#int>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#int>} "42"^^<http://www.w3.org/2001/XMLSchema#int>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#int>)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

Both the getStatements() query and the SPARQL direct query returned exactly what we asked for: ints. The filter match returned all numeric types that matches in value.

"Match '42'^^<http://www.w3.org/2001/XMLSchema#long>." Again we need a bound variable to offer a Literal value to getStatements().

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoLong)

"42"^^<http://www.w3.org/2001/XMLSchema#long>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#long>} "42"^^<http://www.w3.org/2001/XMLSchema#long>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#long>)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

Both the getStatements() query and the SPARQL direct query returned longs. The filter match returned all numeric types.

"Match '42'^^<http://www.w3.org/2001/XMLSchema#double>."

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoDouble)

"42"^^<http://www.w3.org/2001/XMLSchema#double>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#double>} "42"^^<http://www.w3.org/2001/XMLSchema#double>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#double>)} "42"^^<http://www.w3.org/2001/XMLSchema#int>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#float>
"42"^^<http://www.w3.org/2001/XMLSchema#long>
"4.2E1"^^<http://www.w3.org/2001/XMLSchema#double>

Both the getStatements() query and the SPARQL direct query returned doubles. The filter match returned all numeric types.

Matching Numeric Strings and Plain Literals

At this point we are transitioning from tests of numeric matches to tests of string matches, but there is a gray zone to be explored first. What do we find if we search for strings that contain numbers? In particular, what about "plain literal" values that are almost, but not quite, strings?

"Match '42'^^<http://www.w3.org/2001/XMLSchema#string>." This example asks for a typed string to see if we get any numeric matches back.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoString)

"42"^^<http://www.w3.org/2001/XMLSchema#string>
It did not match the plain literal.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"^^<http://www.w3.org/2001/XMLSchema#string>}

"42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42"^^<http://www.w3.org/2001/XMLSchema#string>)} "42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

The getStatements() query matched a literal string only. The SPARQL queries returned matches that were both typed strings and plain literals. There were no numeric matches.

"Match plain literal '42'." This example asks for a plain literal to see if we get any numeric matches back.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, fortyTwoPlain)

"42" This is the plain literal. It did not match the string.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "42"}

"42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "42")} "42"^^<http://www.w3.org/2001/XMLSchema#string>
"42" This is the plain literal value.

The getStatements() query matched the plain literal only, and did not match the string. The SPARQL queries returned matches that were both typed strings and plain literals. There were no numeric matches.

The interesting lesson here is that AllegroGraph distinguishes between strings and plain literals when you use getStatements(), but it lumps them together when you use SPARQL.

Matching Strings

In this section we'll set up a variety of string triples and experiment with matching them using getStatements() and SPARQL. Note that Free Text Search is a different topic. In this section we're doing simple matches of whole strings.

Asserting String Values

We're going to add a "favorite color" attribute to five of the person resources we have used so far. First we need a predicate.

    favoriteColor = conn.createURI(namespace=exns, localname="favoriteColor")

Now we'll create a variety of string values, and a single "plain literal" value.

    UCred = conn.createLiteral('Red', datatype=XMLSchema.STRING)
LCred = conn.createLiteral('red', datatype=XMLSchema.STRING)
RedPlain = conn.createLiteral('Red') #plain literal
rouge = conn.createLiteral('rouge', datatype=XMLSchema.STRING)
Rouge = conn.createLiteral('Rouge', datatype=XMLSchema.STRING)
RougePlain = conn.createLiteral('Rouge') #plain literal
FrRouge = conn.createLiteral('Rouge', language="fr") #plain literal with language tag

Note that in the last line we created a plain literal and assigned it a French language tag. You cannot assign a language tag to strings, only to plain literals. See typed and plain literal values for the specification.

Next we'll add these values to new triples in the triple store. For variety, we bypassed creating client-side statements and just asserted the triples in one step.

    conn.addTriples([(alice, favoriteColor, UCred),
(bob, favoriteColor, LCred),
(carol, favoriteColor, RedPlain),
(dave, favoriteColor, rouge),
(eric, favoriteColor, Rouge),
(fred, favoriteColor, RougePlain),
(greg, favoriteColor, FrRouge)])

If we run a getStatements() query for all favoriteColor triples, we get these values returned:

(<http://people/alice>, <http://people/favoriteColor>, "Red"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/bob>, <http://people/favoriteColor>, "red"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/carol>, <http://people/favoriteColor>, "Red")
(<http://people/dave>, <http://people/favoriteColor>, "rouge"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/eric>, <http://people/favoriteColor>, "Rouge"^^<http://www.w3.org/2001/XMLSchema#string>)
(<http://people/fred>, <http://people/favoriteColor>, "Rouge")
(<http://people/greg>, <http://people/favoriteColor>, "Rouge"@fr)

That's four typed strings, capitalized and lower case, plus three plain literals, one with a language tag.

Matching String Data

First let's search for "Red" without specifying a datatype.

"Match 'Red'." What happens if we search for "Red" without specifying a string datatype?

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, "Red")

"Red" This is the plain literal. It did not match the string.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "Red"}

"Red"^^<http://www.w3.org/2001/XMLSchema#string>
"Red" This is the plain literal value.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "Red")} "Red"^^<http://www.w3.org/2001/XMLSchema#string>
"Red" This is the plain literal value.

The getStatements() query matched the plain literal only, and did not match the similar string. The SPARQL queries matched both "Red" typed strings and "Red" plain literals, but they did not return the lower case "red" triple. The match was liberal regarding datatype but strict about case.

Let's try "Rouge".

"Match 'Rouge'." What happens if we search for "Rouge" without specifying a string datatype or language? Will it match the triple with the French tag?

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, "Rouge")

"Rouge" This is the plain literal. It did not match the string.

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "Rouge"}

"Rouge"^^<http://www.w3.org/2001/XMLSchema#string>
"Rouge" This is the plain literal value.
Did not match the "Rouge"@fr triple.

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "Rouge")} "Rouge"^^<http://www.w3.org/2001/XMLSchema#string>
"Rouge" This is the plain literal value.
Did not match the"Rouge"@fr triple.

The getStatements() query matched the plain literal only, and did not match the similar string. The SPARQL queries matched both "Rouge" typed strings and "Rouge" plain literals, but they did not return the "Rouge"@fr triple. The match was liberal regarding datatype but strict about language. We didn't ask for French, so we didn't get French.

"Match 'Rouge'@fr." What happens if we search for "Rouge"@fr? We'll have to bind the value to a variable (FrRouge) to use getStatements(). We can type the value directly into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, FrRouge)

"Rouge"@fr

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "Rouge"@fr}

"Rouge"@fr

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "Rouge"@fr)} "Rouge"@fr

If you ask for a specific language, that's exactly what you are going to get, in all three types of queries.

You may be wondering how to perform a string match where language and capitalization don't matter. You can do that with a SPARQL filter query using the str() function, which strips out the string portion of a literal, leaving behind the datatype or language tag. Then the lowercase() function eliminates case issues:

PREFIX fn: <http://www.w3.org/2005/xpath-functions#> 
SELECT ?s ?p ?o 
WHERE {?s ?p ?o . filter (fn:lower-case(str(?o)) = "rouge")}

This query returns a variety of "Rouge" triples:

<http://people/greg> <http://people/favoriteColor> "Rouge"@fr
<http://people/fred> <http://people/favoriteColor> "Rouge"
<http://people/eric> <http://people/favoriteColor> "Rouge"^^<http://www.w3.org/2001/XMLSchema#string>
<http://people/dave> <http://people/favoriteColor> "rouge"^^<http://www.w3.org/2001/XMLSchema#string>

This query matched all triples containing the string "rouge" regardless of datatype or language tag. Remember that the SPARQL "filter" queries are powerful, but they are also the slowest queries. SPARQL direct queries and getStatements() queries are faster.

Matching Booleans

In this section we'll assert and then search for Boolean values.

Asserting Boolean Values

The Python language includes predefined True and False symbols for use in your decision-making logic. These Boolean constants are fine for programming, but they are not appropriate for asserting Boolean values into triples. They do not result in legal RDF Boolean values, and this creates subsequent matching issues.

We'll be adding a new attribute to the person resources in our example. Are they, or are they not, seniors?

    senior = conn.createURI(namespace=exns, localname="seniorp")    

The correct way to create Boolean values for use in triples is to create literal values of type Boolean:

    trueValue = conn.createLiteral("true", datatype=XMLSchema.BOOLEAN)
falseValue = conn.createLiteral("false", datatype=XMLSchema.BOOLEAN)

Note that "true" and "false" must be lower case.

We'll only need two triples:

    conn.addTriple(alice, senior, trueValue)
conn.addTriple(bob, senior, falseValue)

When we retrieve the triples (using getStatements()) we see:

(<http://people/bob>, <http://people/seniorp>, "false"^^<http://www.w3.org/2001/XMLSchema#boolean>)
(<http://people/alice>, <http://people/seniorp>, "true"^^<http://www.w3.org/2001/XMLSchema#boolean>)

These are RDF-legal Boolean values that work with the AllegroGraph query engine. Note that "true" and "false" are lower case.

As an object lesson, this getStatements() query uses the Python "True" Boolean, which does not match the values in the RDF store:

conn.getStatements(None, senior, True)     # no matches

"Match 'true'." There are three correct ways to perform a Boolean search. One is to use the varible trueValue (defined above) to pass a Boolean literal value to getStatements(). SPARQL queries will recognize true and false, and of course the fully-typed "true"^^<http://www.w3.org/2001/XMLSchema#boolean> format is also respected by SPARQL:

Query Type Query Matches which types?
getStatements() conn.getStatements(None, senior, trueValue)

"true"^^<http://www.w3.org/2001/XMLSchema#boolean>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p true}

"true"^^<http://www.w3.org/2001/XMLSchema#boolean>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "true"^^<http://www.w3.org/2001/XMLSchema#boolean> "true"^^<http://www.w3.org/2001/XMLSchema#boolean>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = true)} "true"^^<http://www.w3.org/2001/XMLSchema#boolean>
SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "true"^^<http://www.w3.org/2001/XMLSchema#boolean>} "true"^^<http://www.w3.org/2001/XMLSchema#boolean>

All of these queries correctly match Boolean values.

Dates, Times and Datetimes

In this final section of example5(), we'll assert and retrieve dates, times and datetimes.

In this context, you might be surprised by the way that AllegroGraph handles time zone data. If you assert (or search for) a timestamp that includes a time-zone offset, AllegroGraph will "normalize" the expression to Greenwich (zulu) time before proceeding. This normalization greatly speeds up searching and happens transparently to you, but you'll notice that the matched values are all zulu times.

Asserting Date, Time and Datetime Values

We're going to add birthdates to our personnel records. We'll need a birthdate predicate:

    birthdate = conn.createURI(namespace=exns, localname="birthdate")  

We'll also need four types of literal values: a date, a time, a datetime, and a datetime with a time-zone offset.

    date = conn.createLiteral('1984-12-06', datatype=XMLSchema.DATE)     
datetime = conn.createLiteral('1984-12-06T09:00:00', datatype=XMLSchema.DATETIME)
time = conn.createLiteral('09:00:00Z', datatype=XMLSchema.TIME)
datetimeOffset = conn.createLiteral('1984-12-06T09:00:00+01:00', datatype=XMLSchema.DATETIME)

It is interesting to notice that these lilteral values print out exactly as we defined them.

Printing out Literals for date, datetime, time, and datetime with Zulu offset.
"1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>
"1984-12-06T09:00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
"09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>
"1984-12-06T09:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>

Now we'll add them to the triple store:

    conn.addTriples([(alice, birthdate, date),
(bob, birthdate, datetime),
(carol, birthdate, time),
(dave, birthdate, datetimeOffset)])

And then retrieve them using getStatements():

Showing all birthday triples using getStatements(). Should be four.
(<http://people/alice>, <http://people/birthdate>, "1984-12-06"^^<http://www.w3.org/2001/XMLSchema#date>)
(<http://people/bob>, <http://people/birthdate>, "1984-12-06T09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(<http://people/carol>, <http://people/birthdate>, "09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>)
(<http://people/dave>, <http://people/birthdate>, "1984-12-06T08:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)

If you look sharply, you'll notice that the zulu offset has been normalized:

Was:"1984-12-06T09:00:00+01:00"
Now:"1984-12-06T08:00:00Z"

Note that the one-hour zulu offset has been applied to the timestamp. "9:00" turned into "8:00."

Matching Date, Time, and Datetime Literals

"Match date." What happens if we search for the date literal we defined? We'll use the "date" variable with getStatements(), but just type the expected value into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, date)

"1984-12-06"
^^<http://www.w3.org/2001/XMLSchema#date>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p '1984-12-06'^^<http://www.w3.org/2001/XMLSchema#date>

"1984-12-06"
^^<http://www.w3.org/2001/XMLSchema#date>

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o =
'1984-12-06'
^^<http://www.w3.org/2001/XMLSchema#date>)}
"1984-12-06"
^^<http://www.w3.org/2001/XMLSchema#date>

All three queries match narrowly, meaning the exact date and datatype we asked for is returned.

"Match datetime." What happens if we search for the datetime literal? We'll use the "datetime" variable with getStatements(), but just type the expected value into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, datetime)

"1984-12-06T09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p '1984-12-06T09:00:00Z'
^^<http://www.w3.org/2001/XMLSchema#dateTime> .}

"1984-12-06T09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = '1984-12-06T09:00:00Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> "1984-12-06T09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

The matches are specific for the exact date, time and type.

"Match time." What happens if we search for the time literal? We'll use the "time" variable with getStatements(), but just type the expected value into the SPARQL queries.

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, time)

"09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time> .}

"09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time>

SPARQL filter match SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "09:00:00Z"^^<http://www.w3.org/2001/XMLSchema#time>)} "09:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#time>

The matches are specific for the exact time and type.

"Match datetime with offset." What happens if we search for a datetime with zulu offset?

Query Type Query Matches which types?
getStatements() conn.getStatements(None, age, datetimeOffset)

"1984-12-06T08:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL direct match SELECT ?s ?p WHERE {?s ?p "1984-12-06T09:00:00+01:00"
^^<http://www.w3.org/2001/XMLSchema#dateTime> .}

"1984-12-06T08:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

SPARQL filter match

SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = "1984-12-06T09:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)}

"1984-12-06T08:00:00Z"
^^<http://www.w3.org/2001/XMLSchema#dateTime>

Note that we searched for "1984-12-06T09:00:00+01:00" but found "1984-12-06T08:00:00Z". It is the same moment in time.

Importing Triples (example6() and example7())    Return to Top

The Python API client can load triples in either RDF/XML format or NTriples format. The example below calls the connection object's add() method to load an NTriples file, and addFile() to load an RDF/XML file. Both methods work, but the best practice is to use addFile().

Note: If you get a "file not found" error while running this example, it means that Python is looking in the wrong directory for the data files to load. The usual explanation is that you have moved the tutorial_examples_40.py file to an unexpected directory. You can clear the issue by putting the data files in the same directory as tutorial_examples_40.py, or by setting the Python current working directory to the location of the data files using os.setcwd().

The RDF/XML file contains a short list of v-cards (virtual business cards), like this one:

  <rdf:Description rdf:about="http://somewhere/JohnSmith/">
<vCard:FN>John Smith</vCard:FN>
<vCard:N rdf:parseType="Resource">
<vCard:Family>Smith</vCard:Family>
<vCard:Given>John</vCard:Given>
</vCard:N>
</rdf:Description>

The NTriples file contains a graph of resources describing the Kennedy family, the places where they were each born, their colleges, and their professions. A typical entry from that file looks like this:

<http://www.franz.com/simple#person1> <http://www.franz.com/simple#first-name> "Joseph" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#middle-initial> "Patrick" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#last-name> "Kennedy" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#suffix> "none" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#alma-mater> <http://www.franz.com/simple#Harvard> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#birth-year> "1888" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#death-year> "1969" . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#sex> <http://www.franz.com/simple#male> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#spouse> <http://www.franz.com/simple#person2> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#has-child> <http://www.franz.com/simple#person3> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#profession> <http://www.franz.com/simple#banker> . 
<http://www.franz.com/simple#person1> <http://www.franz.com/simple#birth-place> <http://www.franz.com/simple#place5> . 
<http://www.franz.com/simple#person1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.franz.com/simple#person> . 

Note that AllegroGraph can segregate triples into contexts (subgraphs) by treating them as quads, but the NTriples and RDF/XML formats can not include context information. They deal with triples only, so there is no place to store a fourth field in those formats. In the case of the add() call, we have omitted the context argument so the triples are loaded the default background graph (sometimes called the "null context.") The addFile() call includes an explicit context setting, so the fourth argument of each vcard triple will be the context named "/tutorial/vc_db_1_rdf". The connection size() method takes an optional context argument. With no argument, it returns the total number of triples in the repository. Below, it returns the number '16' for the 'context' context argument, and the number '28' for the null context (None) argument.

The example6() function of tutorial_examples_40.py creates a dedicated session connection to AllegroGraph, using methods you have seen before, plus the Connection object's openSession() method:

def example6():
server = AllegroGraphServer(AG_HOST, AG_PORT, 'test', 'xyzzy')
catalog = server.openCatalog('scratch')
myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW)
myRepository.initialize()
conn = myRepository.getConnection()
conn.openSession()

The dedicated session is not immediately pertinent to the examples in this section, but will become important in later examples that reuse this connection to demonstrate Prolog Rules and Social Network Analysis.

The variables path1 and path2 are bound to the RDF/XML and NTriples files, respectively.

    path1 = "./python-vcards.rdf"    
path2 = "./python-kennedy.ntriples"

Both examples need a base URI as one of the required arguments to the asserting methods:

    baseURI = "http://example.org/example/local"

The NTriples about the vcards will be added to a specific context, so naturally we need a URI to identify that context.

    context = conn.createURI("http://example.org#vcards")

In the next step we use addFile() to load the vcard triples into the #vcards context:

    conn.addFile(path1, baseURI, format=RDFFormat.RDFXML, context=context);

Then we use add() to load the Kennedy family tree into the null context:

    conn.add(path2, base=baseURI, format=RDFFormat.NTRIPLES, contexts=None)

Now we'll ask AllegroGraph to report on how many triples it sees in the null context and in the #vcards context:

    print "After loading, repository contains %i vcard triples in context '%s'\n    
           and   %i kennedy triples in context '%s'." % 
(conn.size(context), context, conn.size('null'), 'null')
return conn

The output of this report was:

After loading, repository contains 16 vcard triples in context 'http://example.org#vcards'
and 1214 kennedy triples in context 'null'.

The SPARQL query below is found in example7() of tutorial_examples_40.py. It borrows the same triples we loaded in example6(), above, and runs two unconstrained retrievals. The first uses getStatement, and prints out the subject URI and context of each triple.

def example7():    
conn = example6()
print "Match all and print subjects and contexts"
result = conn.getStatements(None, None, None, None, limit=25)
for row in result: print row.getSubject(), row.getContext()

This loop prints out a mix of triples from the null context and from the #vcards context. We set a limit of 25 triples because the Kennedy dataset contains over a thousand triples.

The following loop, however, does not produce the same results. This is a SPARQL query that should match all available triples, printing out the subject and context of each triple. We limited this query by using the DISTINCT keyword. Otherwise there would be many duplicate results.

    print "\nSame thing with SPARQL query (can't retrieve triples in the null context)"
queryString = "SELECT DISTINCT ?s ?c WHERE {graph ?c {?s ?p ?o .} }"
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
for i, bindingSet in enumerate(result):
print bindingSet[0], bindingSet[1]
conn.close()

In this case, the loop prints out only v-card results. The SPARQL query is not able to access the null context when a non-null context is also present.

Exporting Triples (example8() and example9())    Return to Top

The next examples show how to write triples out to a file in either NTriples format or RDF/XML format. The output of either format may be optionally redirected to standard output (the Python command window) for inspection.

Example example8() begins by obtaining a connection object from example6(). This means the repository contains v-card triples in the #vcards context, and Kennedy family tree triples in the null context.

def example8():
conn = example6()

In this example, we'll export the triples in the #vcards context.

    context = conn.createURI("http://example.org#vcards")

To write triples in NTriples format, call NTriplesWriter(). You have to tell it the path and file name of the exported file. If the output file argument is 'None', the writers write to standard output. You can uncomment that line if you'd like to see it work.

    outputFile = "/tmp/temp.nt"
#outputFile = None
if outputFile == None:
print "Writing RDF to Standard Out instead of to a file"
ntriplesWriter = NTriplesWriter(outputFile)
conn.export(ntriplesWriter, context);

To write triples in RDF/XML format, call RDFXMLWriter().

    outputFile2 = "/tmp/temp.rdf"
#outputFile2 = None
if outputFile2 == None:
print "Writing NTriples to Standard Out instead of to a file"
rdfxmlfWriter = RDFXMLWriter(outputFile2)
conn.export(rdfxmlfWriter, context)

The export() method writes out all triples in one or more contexts. This provides a convenient means for making local backups of sections of your RDF store. If two or more contexts are specified, then triples from all of those contexts will be written to the same file. Since the triples are "mixed together" in the file, the context information is not recoverable. If the context argument is omitted, all triples in the store are written out, and again all context information is lost.

Finally, if the objective is to write out a filtered set of triples, the exportStatements() method can be used. The example below (from example9()) writes out all RDF:TYPE declaration triples to standard output.

conn.exportStatements(None, RDF.TYPE, None, False, RDFXMLWriter(None))

Datasets and Contexts (example10())    Return to Top

We have already seen contexts at work when loading and saving files. In example10() we provide more realistic examples of contexts, and we introduce the dataset object. A dataset is a list of contexts that should all be searched simultaneously.

To set up the example, we create six statements, and add two of each to three different contexts: context1, context2, and the null context. The process of setting up the six statements follows the same pattern as we used in the previous examples:

conn.clear()
exns = "http://example.org/people/"
alice = f.createURI(namespace=exns, localname="alice")
bob = f.createURI(namespace=exns, localname="bob")
ted = f.createURI(namespace=exns, localname="ted")
person = f.createURI(namespace=exns, localname="Person")
name = f.createURI(namespace=exns, localname="name")    
alicesName = f.createLiteral("Alice")    
bobsName = f.createLiteral("Bob")
tedsName = f.createLiteral("Ted")    
context1 = f.createURI(namespace=exns, localname="cxt1")      
context2 = f.createURI(namespace=exns, localname="cxt2")          
conn.add(alice, RDF.TYPE, person, context1)
conn.add(alice, name, alicesName, context1)
conn.add(bob, RDF.TYPE, person, context2)
conn.add(bob, name, bobsName, context2)
conn.add(ted, RDF.TYPE, person)
conn.add(ted, name, tedsName)

The first test uses getStatements() to return all triples in all contexts (cxt1, cxt2, and null).

statements = conn.getStatements(None, None, None, False)
print "All triples in all contexts:"
for s in statements:
    print s

The output of this loop is shown below. The context URIs are in the fourth position. Triples from the null context have no context value.

All triples in all contexts:
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/cxt1>) (<http://example.org/people/alice>, <http://example.org/people/name>, "Alice", <http://example.org/people/cxt1>)
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/cxt2>)
(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/cxt2>)
(<http://example.org/people/ted>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>)
(<http://example.org/people/ted>, <http://example.org/people/name>, "Ted")

The next match explicitly lists 'context1' and 'context2' as the only contexts to participate in the match. It returns four statements.

statements = conn.getStatements(None, None, None, False, [context1, context2])
print "Triples in contexts 1 and 2:"
for s in statements:
    print s

The output of this loop shows that the triples in the null context have been excluded.

Triples in contexts 1 or 2:
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/cxt1>) (<http://example.org/people/alice>, <http://example.org/people/name>, "Alice", <http://example.org/people/cxt1>)
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/cxt2>)
(<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/cxt2>)

This time we use getStatements() to search explicitly for triples in the null context and in context 2.

statements = conn.getStatements(None, None, None, ['null', context2])
print "Triples in contexts null or 2:"
for s in statements:
print s

The output of this loop is:

Triples in contexts null or 2:
(<http://example.org/people/ted>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>) (<http://example.org/people/ted>, <http://example.org/people/name>, "Ted")
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/people/Person>, <http://example.org/people/cxt2>) (<http://example.org/people/bob>, <http://example.org/people/name>, "Bob", <http://example.org/people/cxt2>)

Next, we switch to SPARQL queries. Named contexts may be included in the FROM and FROM-NAMED clauses in a SPARQL query. Below, we illustrate the procedural equivalent, which is to create a dataset object, add the contexts to that, and then to attach the dataset to the query object. The query is (again) restricted to only those statements in contexts 1 and 2.

queryString = """
SELECT ?s ?p ?o ?c
WHERE { GRAPH ?c {?s ?p ?o . } } 
"""
ds = Dataset()
ds.addNamedGraph(context1)
ds.addNamedGraph(context2)
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();    
print "Query over contexts 1 and 2."
for bindingSet in result:
    print bindingSet.getRow()

The output of this loop contains four triples, as expected.

Query over contexts 1 and 2.
['<http://example.org/people/alice>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/cxt1>'] ['<http://example.org/people/alice>', '<http://example.org/people/name>', '"Alice"', '<http://example.org/people/cxt1>']
['<http://example.org/people/bob>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>', '<http://example.org/people/cxt2>']
['<http://example.org/people/bob>', '<http://example.org/people/name>', '"Bob"', '<http://example.org/people/cxt2>']

Currently, its not possible to combine the null context with other contexts in a SPARQL query. Below, we illustrate how to evaluate a query against only the null context.

queryString = """
SELECT ?s ?p ?o    
WHERE {?s ?p ?o . } 
"""
ds = Dataset()
ds.addDefaultGraph(None)
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();    
print "Query over the null context."
for bindingSet in result:
    print bindingSet.getRow()

The output of this loop is:

Query over the null context.
['<http://example.org/people/ted>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://example.org/people/Person>'] ['<http://example.org/people/ted>', '<http://example.org/people/name>', '"Ted"']

Namespaces (example11())    Return to Top

A namespace is that portion of a URI that preceeds the last '#', '/', or ':' character, inclusive. The remainder of a URI is called the localname. For example, with respect to the URI "http://example.org/people/alice", the namespace is "http://example.org/people/" and the localname is "alice". When writing SPARQL queries, it is convenient to define prefixes or nicknames for the namespaces, so that abbreviated URIs can be specified. For example, if we define "ex" to be a nickname for "http://example.org/people/", then the string "ex:alice" is a recognized abbreviation for "http://example.org/people/alice". This abbreviation is called a qname.

In the SPARQL query in the example below, we see two qnames, "rdf:type" and "ex:alice". Ordinarily, we would expect to see "PREFIX" declarations in SPARQL that define namespaces for the "rdf" and "ex" nicknames. However, the Connection and Query machinery can do that job for you. The mapping of prefixes to namespaces includes the built-in prefixes RDF, RDFS, XSD, and OWL. Hence, we can write "rdf:type" in a SPARQL query, and the system already knows its meaning. In the case of the 'ex' prefix, we need to instruct it. The setNamespace() method of the connection object registers a new namespace. In the example below, we first register the 'ex' prefix, and then submit the SPARQL query. It is legal, although not recommended, to redefine the built-in prefixes RDF, etc..

The example example11() begins by borrowing a connection object from example1().

def example11():
conn = example1()

We need a namespace string (bound to the variable exns) to use when generating the alice and person URIs.

    exns = "http://example.org/people/"
alice = conn.createURI(namespace=exns, localname="alice")
person = conn.createURI(namespace=exns, localname="Person")

Now we can assert Alice's RDF:TYPE triple. Then we have to remind AllegroGraph to index it.

    conn.add(alice, RDF.TYPE, person)
conn.indexTriples(all=True, asynchronous=True)

Now we register the exns namespace with the connection object, so we can use it in a SPARQL query. The query looks for triples that have "rdf:type" in the predicate position, and "ex:Person" in the object position.

    conn.setNamespace('ex', exns)
queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . FILTER ((?p = rdf:type) && (?o = ex:Person) ) }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
print
for bindingSet in result:
print bindingSet[0], bindingSet[1], bindingSet[2]

The output shows the single triple with its fully-expanded URIs. This demonstrates that the qnames in the SPARQL query successfully matched the fully-expanded URIs in the triple.

http://example.org/people/alice http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/people/Person

It is worthwhile to briefly discuss performance here. In the current AllegroGraph system, queries run more efficiently if constants appear inside of the "where" portion of a query, rather than in the "filter" portion. For example, the SPARQL query below will evaluate more efficiently than the one in the above example. However, in this case, you have lost the ability to output the constants "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" and "http://example.org/people/alice". Occasionally you may find it useful to output constants in the output of a 'select' clause; in general though, the above code snippet illustrates a query syntax that is discouraged.

SELECT ?s  
WHERE { ?s rdf:type ex:person } 

It is common for users to build RDF applications that combine some form of "keyword search" with their queries. For example, a user might want to retrieve all triples for which the string "Alice" appears as a word within the third (object) argument to the triple. AllegroGraph provides a capability for including free text matching within a SPARQL query. It requires, however, that you register the predicates that will participate in text searches so they can be indexed.

The example example12() begins by borrowing the connection object from example1(). Then it creates a namespace string and registers the namespace with the connection object, as in the previous example.

def example12():
conn = example1()
exns = "http://example.org/people/"
conn.setNamespace('ex', exns)

We have to register the predicates that will participate in text indexing. In the example12() example below, we have called the connection method registerFreeTextPredicate() to register the predicate "http://example.org/people/fullname" for text indexing. Generating the predicate's URI is a separate step.

    conn.registerFreeTextPredicate(namespace=exns, localname='fullname')
fullname = conn.createURI(namespace=exns, localname="fullname")

The next step is to create two new resources, "Alice1" named "Alice B. Toklas," and "book1" with the title "Alice in Wonderland." Notice that we did not register the book title predicate for text indexing.

    alice = conn.createURI(namespace=exns, localname="alice1")
persontype = conn.createURI(namespace=exns, localname="Person")
alicename = conn.createLiteral('Alice B. Toklas')
book = conn.createURI(namespace=exns, localname="book1")
booktype = conn.createURI(namespace=exns, localname="Book")
booktitle = conn.createURI(namespace=exns, localname="title")
wonderland = conn.createLiteral('Alice in Wonderland')

Clear the repository, so our new triples are the only ones available.

    conn.clear()    

Add the resource for the new person, Alice B. Toklas:

    conn.add(alice, RDF.TYPE, persontype)
conn.add(alice, fullname, alicename)

Add the new book, Alice in Wonderland.

    conn.add(book, RDF.TYPE, booktype)    
conn.add(book, booktitle, wonderland)

Now we set up the SPARQL query that looks for triples containing "Alice" in the object position.

The text match occurs through a "magic" predicate called fti:match. This is not an RDF "predicate" but a LISP "predicate," meaning that it behaves as a true/false test. This predicate has two arguments. One is the subject URI of the resources to search. The other is the string pattern to search for, such as "Alice". Only registered text predicates will be searched. Only full-word matches will be found.

    conn.setNamespace('ex', exns)
queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match 'Alice' . }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();

There is no need to include a prefix declaration for the 'fti' nickname. That is because 'fti' is included among the built-in namespace/nickname mappings in AllegroGraph.

When we execute our SPARQL query, it matches the "Alice" within the literal "Alice B. Toklas" because that literal occurs in a triple having the registered fullname predicate, but it does not match the "Alice" in the literal "Alice in Wonderland" because the booktitle predicate was not registered for text indexing. This query returns all triples of a resource that had a successful match in at least one object value.

    print "Found %i query results" % len(result)    
count = 0
for bindingSet in result:
print bindingSet
count += 1
if count > 5: break

The output of this loop is:

Found 2 query results
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/alice1', 'o': 'http://example.org/people/Person'} {'p': 'http://example.org/people/fullname', 's': 'http://example.org/people/alice1', 'o': '"Alice B. Toklas"'}

The text index supports simple wildcard queries. The asterisk (*) may be appended to the end of the pattern to indicate "any number of additional characters." For instance, this query looks for whole words that begin with "Ali":

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match 'Ali*' . }
"""

It finds the same two triples as before.

There is also a single-character wildcard, the questionmark. You can add as many question marks as you need to the string pattern. This query looks for a five-letter word that has "l" in the second position, and "c" in the fourth position:

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match '?l?c?' . }
"""

This query finds the same two triples as before.

This time we'll do something a little different. The free text indexing matches whole words only, even when using wildcards. What if you really need to match a substring in a word of unknown length. You can write a SPARQL query that performs a regex match against object values. This can be inefficient compared to indexed search, and the match is not confined to the registered free-text predicates. The following query looks for the substring "lic" in literal object values:

    queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . FILTER regex(?o, "lic") }
"""

This query returns two triples, but they are not quite the same as before:

Substring match for 'lic'
Found 2 query results
{'p': 'http://example.org/people/fullname', 's': 'http://example.org/people/alice1', 'o': '"Alice B. Toklas"'}
{'p': 'http://example.org/people/title', 's': 'http://example.org/people/book1', 'o': '"Alice in Wonderland"'}

As you can see, the regex match found "lic" in "Alice in Wonderland," which was not a registered free-text predicate. It made this match by doing a string comparison against every object value in the triple store. Even though you can streamline the SPARQL query considerably by writing more restrictive patterns, this is still inherently less efficient than using the indexed approach.

Ask, Describe, and Construct Queries (example13())   Return to Top

SPARQL provides alternatives to the standard SELECT query. Example example13() exercises these alternatives to show how AllegroGraph Server handles them.

The example begins by borrowing a connection object from example2(). Then it registers two namespaces for use in the SPARQL queries.:

def example13():
conn = example2()
conn.setNamespace('ex', "http://example.org/people/")
conn.setNamespace('ont', "http://example.org/ontology/")

The example begins with an unconstrained SELECT query so we can see what triples are available for matching.

    queryString = """select ?s ?p ?o where { ?s ?p ?o} """
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
print "SELECT result"
for r in result: print r
The output for the SELECT query was four triples about Alice and Bob:
SELECT result
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/alice', 'o': 'http://example.org/ontology/Person'} {'p': 'http://example.org/ontology/name', 's': 'http://example.org/people/alice', 'o': '"Alice"'}
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/bob', 'o': 'http://example.org/ontology/Person'}
{'p': 'http://example.org/ontology/name', 's': 'http://example.org/people/bob', 'o': '"Bob"'}

The ASK query returns a Boolean, depending on whether the triple pattern matched any triples. In this case it looks for any ont:name triplecontaining the value "Alice." Note that the ASK query uses a different construction method than the SELECT query: prepareBooleanQuery().

    queryString = """ask { ?s ont:name "Alice" } """
booleanQuery = conn.prepareBooleanQuery(QueryLanguage.SPARQL, queryString)
result = booleanQuery.evaluate();
print "Boolean result", result

The output of this loop is:

Boolean result True

The CONSTRUCT query contructs a statement object out of the matching values in the triple pattern. A "statement" is a client-side triple. Construction queries use prepareGraphQuery(). The point is that the query can bind variables from existing triples and then "construct" a new triple by recombining the values.

    queryString = """construct {?s ?p ?o} where { ?s ?p ?o . filter (?o = "Alice") } """
constructQuery = conn.prepareGraphQuery(QueryLanguage.SPARQL, queryString)
result = constructQuery.evaluate();
for st in result:
print "Construct result, S P O values in statement:", st.getSubject(), st.getPredicate(), st.getObject()

The output of this loop is below. It has created a statement from values found in the repository.

Construct result, S P O values in statement: <http://example.org/people/alice> <http://example.org/ontology/name> "Alice"
The DESCRIBE query returns a "graph," meaning all triples of the matching resources. It uses prepareGraphQuery().
    queryString = """describe ?s where { ?s ?p ?o . filter (?o = "Alice") } """
describeQuery = conn.prepareGraphQuery(QueryLanguage.SPARQL, queryString)
result = describeQuery.evaluate();
print "Describe result"
for st in result: print st

The output of this loop is:

Describe result
(<http://example.org/people/alice>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Person>) (<http://example.org/people/alice>, <http://example.org/ontology/name>, "Alice")

Parametric Queries (example14())   Return to Top

The Python API to AllegroGraph Server lets you set up a SPARQL query and then fix the value of one of the query variables prior to matching the triples. This is more efficient than testing for the same value in the body of the query.

In example14() we set up two-triple resources for Bob and Alice, and then use an unconstrained SPARQL query to retrieve the triples. Normally this query would find all four triples, but by binding the subject value ahead of time, we can retrieve the "Bob" triples separately from the "Alice" triples.

The example begins by borrowing a connection object from example2(). This means there are already Bob and Alice resources in the repository. We do need to recreate the URIs for the two resources, however.

def example14():
conn = example2()
alice = conn.createURI("http://example.org/people/alice")
bob = conn.createURI("http://example.org/people/bob")

The SPARQL query is the simple, unconstrained query that returns all triples. We use prepareTupleQuery() to create the query object.

    queryString = """select ?s ?p ?o where { ?s ?p ?o} """
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)

Before evaluating the query, however, we'll use the query object's setBinding() method to assign Alice's URI to the "s" variable in the query. This means that all matching triples are required to have Alice's URI in the subject position of the triple.

    tupleQuery.setBinding("s", alice)
result = tupleQuery.evaluate()
print "Facts about Alice:"
for r in result: print r

The output of this loop consists of all triples that describe Alice:

Facts about Alice:
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/alice', 'o': 'http://example.org/ontology/Person'} {'p': 'http://example.org/ontology/name', 's': 'http://example.org/people/alice', 'o': '"Alice"'}

Now we'll run the same query again, but this time we'll constrain "s" to be Bob's URI. The query will return all triples that describe Bob.

    tupleQuery.setBinding("s", bob)
print "Facts about Bob:"
result = tupleQuery.evaluate()
for r in result: print r
The output of this loop is:
Facts about Bob:
{'p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 's': 'http://example.org/people/bob', 'o': 'http://example.org/ontology/Person'} {'p': 'http://example.org/ontology/name', 's': 'http://example.org/people/bob', 'o': '"Bob"'}

Range Matches (example15())   Return to Top

Example15() demonstrates how to set up a query that matches a range of values. In this case, we'll retrieve all people between 30 and 50 years old (inclusive). We can accomplish this using the connection object's createRange() method.

This example begins by getting a connection object from example1(), and then clearing the repository of the existing triples.

def example15():
conn = example1()
conn.clear()

Then we register a namespace to use in the query.

    exns = "http://example.org/people/"
conn.setNamespace('ex', exns)

Next we need to set up the URIs for Alice, Bob, Carol and the predicate "age".

    alice = conn.createURI(namespace=exns, localname="alice")
bob = conn.createURI(namespace=exns, localname="bob")
carol = conn.createURI(namespace=exns, localname="carol")
age = conn.createURI(namespace=exns, localname="age")

In this step, we use the connection's createRange() method to generate a range object with limits 30 and 50:

    range = conn.createRange(30, 50)

The next two lines are essential to the experiment, but you can take your pick of which one to use. The range comparison requires that all of the matching values must have the same datatype. In this case, the values must all be ints. The connection object lets us force this uniformity on the data through the registerDatatypeMapping() method. This conversion comes in two versions. You can force all values of a specific predicate to be internally represented as one datatype, as we do here:

    if True: conn.registerDatatypeMapping(predicate=age, nativeType="int")

This line declares that all values of the age predicate will be represented in Python as ints.

We can also map one datatype into another. In this line, all values represented in the XMLSchema as INTs will be treated as ints in Python.

    if True: conn.registerDatatypeMapping(datatype=XMLSchema.INT, nativeType="int")    

If we turn off both mappings, the range comparison fails with internal errors. Why? Because the example deliberately uses inconsistent data.

    conn.add(alice, age, 42)
    conn.add(bob, age, 24) 
conn.add(carol, age, "39")

Carol's age is initially represented as a string instead of an int. This breaks the range comparison. The datatype mapping forces the string value to become an int.

The next step is to use getStatements() to retrieve all triples where the age value is between 30 and 50.

    rows = conn.getStatements(None, age, range)
for r in rows:
print r

The output of this loop is:

(<http://example.org/people/alice>, <http://example.org/people/age>, "42"^^<http://www.w3.org/2001/XMLSchema#int>)
(<http://example.org/people/carol>, <http://example.org/people/age>, "39"^^<http://www.w3.org/2001/XMLSchema#int>)

It has matched 42 and "39", but not 24.

Federated Repositories (example16())   Return to Top

AllegroGraph lets you split up your triples among repositories on multiple servers and then search them all in parallel. To do this we query a single "federated" repository that automatically distributes the queries to the secondary repositories and combines the results. From the point of view of your Python code, it looks like you are working with a single repository.

Example16() begins by defining a small output function that we'll use at the end of the lesson. It prints out responses from different repositories. This example is about red apples and green apples, so the output function talks about apples.

def example16():
def pt(kind, rows, expected):
print "\n%s Apples:\t" % kind.capitalize(),
for r in rows: print r[0].getLocalName(),

In the next block of code, we open connections to a redRepository and a greenRepository on the local server. In a typical federation scenario, these respositories would be distributed across multiple servers.

    server = AllegroGraphServer(AG_HOST, AG_PORT, 'test', 'xyzzy')
catalog = server.openCatalog(AG_CATALOG)
## create two ordinary stores, and one federated store:
redConn = catalog.getRepository("redthings", Repository.RENEW).initialize().getConnection()
greenConn = catalog.getRepository("greenthings", Repository.RENEW).initialize().getConnection()

Now we create a "federated" respository, which is connected to the distributed repositories at the back end.

    federated = server.openCatalog(server.FEDERATED)
try:
federated.deleteRepository("rainbowthings")
finally:
pass
federated.createRepository("rainbowthings", repos=[ AG_CATALOG + ":redthings", AG_CATALOG + ":greenthings"])
rainbowConn = federated.getRepository("rainbowthings").initialize().getConnection()

It is necessary to register the "ex" namespace in all three repositories so we can use it in the upcoming query.

    ex = "http://www.demo.com/example#"
redConn.setNamespace('ex', ex)
greenConn.setNamespace('ex', ex)
rainbowConn.setNamespace('ex', ex)

The next step is to populate the Red and Green repositories with a few triples.

    redConn.add(redConn.createURI(ex+"mcintosh"), RDF.TYPE, redConn.createURI(ex+"Apple"))
redConn.add(redConn.createURI(ex+"reddelicious"), RDF.TYPE, redConn.createURI(ex+"Apple"))
greenConn.add(greenConn.createURI(ex+"pippin"), RDF.TYPE, greenConn.createURI(ex+"Apple"))
greenConn.add(greenConn.createURI(ex+"kermitthefrog"), RDF.TYPE, greenConn.createURI(ex+"Frog"))

Now we write a query that retrieves Apples from the Red repository, the Green repository, and the federated repository, and prints out the results.

    queryString = "select ?s where { ?s rdf:type ex:Apple }"
## query each of the stores; observe that the federated one is the union of the other two:
pt("red", redConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString).evaluate())
pt("green", greenConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString).evaluate())
pt("federated", rainbowConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString).evaluate())

The output is shown below. The federated response combines the individual responses.

Red Apples:   reddelicious mcintosh 
Green Apples: pippin
Federated Apples: reddelicious mcintosh pippin

Prolog Rule Queries (example17())   Return to Top

AllegroGraph Server lets us load Prolog backward-chaining rules to make query-writing simpler.  The Prolog rules let us write the queries in terms of higher-level concepts.  When a query refers to one of these concepts, Prolog rules become active in the background to determine if the concept is valid in the current context. 

For instance, in this example the query says that the matching resource must be a "man".  A Prolog rule examines the matching resources to see which of them are persons who are male.  The query can proceed for those resources. The rules provide a level of abstraction that makes the queries simpler to express.

The example17() example begins by borrowing a connection object from example example6(), which contains the Kennedy family tree.  Note that example6() creates a dedicated session for the rules to operate in, using the Connection object's openSession() method. Python rules cannot be loaded into the AllegroGraph common back end.

conn.openSession(); # in example6()

This converts the connection to a "dedicated" connection. After that step, all of the code is exactly the same as when using the common back end.

def example17():
conn = example6() #obtain dedicated connection from example6()

We will need the same namespace as we used in the Kennedy example.

    conn.setNamespace("kdy", "http://www.franz.com/simple#")

These are the "man" and "woman" rules.  A resource represents a "woman" if the resource contains a sex=female triple and an rdf:type = person triple.  A similar deduction identifies a "man".  The "q" at the beginning of each pattern simply stands for "query" and introduces a triple pattern.

    rules1 = """
(<-- (woman ?person) ;; IF
(q ?person !kdy:sex !kdy:female)
(q ?person !rdf:type !kdy:person))
(<-- (man ?person) ;; IF
(q ?person !kdy:sex !kdy:male)
(q ?person !rdf:type !kdy:person))
"""

The rules must be explicitly added to the dedicated session.

    conn.addRules(rules1)

This is the query.  This query locates all the "man" resources, and retrieves their first and last names.

    queryString = """
(select (?first ?last)
(man ?person)
(q ?person !kdy:first-name ?first)
(q ?person !kdy:last-name ?last)
)
"""

Here we perform the query and retrieve the result object. 

    tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();

The result object contains multiple bindingSets.  We can iterate over them to print out the values.

    for bindingSet in result:
f = bindingSet.getValue("first")
l = bindingSet.getValue("last")
print "%s %s" % (f, l)

The output contains many names; there are just a few of them.

"Robert" "Kennedy"
"Alfred" "Tucker"
"Arnold" "Schwarzenegger"
"Paul" "Hill"
"John" "Kennedy"

It is good form to close the dedicated session when you are finished with it.

    conn.closeDedicated()

Loading Prolog Rules (example18())   Return to Top

Example example18() demonstrates how to load a file of Prolog rules into the Python API of AllegroGraph Server.  It also demonstrates how robust a rule-augmented system can become.  The domain is the Kennedy family tree again, borrowed from example6().  After loading a file of rules (python-rules.txt), we'll pose a simple query.  The query asks AllegroGraph to list all the uncles in the family tree, along with each of their nieces or nephews.  This is the query:

(select (?person ?uncle) (uncle ?y ?x)(name ?x ?person)(name ?y ?uncle))

The problem is that the triple store contains no information about uncles.  The rules will have to deduce this relationship by finding paths across the RDF graph.. 

What's an "uncle," then?  Here's a rule that can recognize uncles:

(<-- (uncle ?uncle ?child) 
(man ?uncle)
(parent ?grandparent ?uncle)
(parent ?grandparent ?siblingOfUncle)
(not (= ?uncle ?siblingOfUncle))
(parent ?siblingOfUncle ?child))

The rule says that an "uncle" is a "man" who has a sibling who is the "parent" of a child.  (Rules like this always check to be sure that the two nominated siblings are not the same resource.) Note that none of these relationships are triple patterns. They all deal in higher-order concepts. We'll need additional rules to determine what a "man" is, and what a "parent" is.

What is a "parent?" It turns out that there are two ways to be classified as a parent:

(<-- (parent ?father ?child)
(father ?father ?child)) (<-- (parent ?mother ?child)
(mother ?mother ?child))

A person is a "parent" if a person is a "father."  Similarly, a person is a "parent" if a person is a "mother."

What's a "father?"

(<-- (father ?parent ?child)
(man ?parent)
(q ?parent !rltv:has-child ?child))

A person is a "father" if the person is "man" and has a child.   The final pattern is a triple match from the Kennedy family tree.

What's a "man?"

(<-- (man ?person)
(q ?person !rltv:sex !rltv:male)
(q ?person !rdf:type !rltv:person))

A "man" is a person who is male.  These patterns both match triples in the repository. 

The relative_rules.txt file contains many more Prolog rules describing relationships, including transitive relationships like "ancestor" and "descendant." Please examine this file for more ideas about how to use rules with AllegroGraph.

The example18() example begins by borrowing a connection object from example6(), which means the Kennedy family tree is already loaded into the repository, and we are dealing with a dedicated session.  

def example18():
conn = example6()

We need these two namespaces because they are used in the query and in the file of rules.

    conn.setNamespace("kdy", "http://www.franz.com/simple#")
conn.setNamespace("rltv", "http://www.franz.com/simple#")

The next step is to load the rule file:

    path = "./relative_rules.txt"
conn.loadRules(path)

The query asks for the full name of each uncle and each niece/nephew.  (The (name ?x ?fullname) relationship used in the query is provided by yet another Prolog rule, which concatenates a person's first and last names into a single string.)

    queryString = """(select (?person ?uncle) (uncle ?y ?x)(name ?x ?person)(name ?y ?uncle))"""

Here we execute the query and display the results:

    tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
for bindingSet in result:
p = bindingSet.getValue("person")
u = bindingSet.getValue("uncle")
print "%s is the uncle of %s." % (u, p)

The output of this loop (in part) looks like this:

"{Edward} {Kennedy}" is the uncle of "{William} {Smith}".
"{Edward} {Kennedy}" is the uncle of "{Amanda} {Smith}".
"{John} {Kennedy}" is the uncle of "{Anthony} {Shriver}".
"{John} {Kennedy}" is the uncle of "{Mark} {Shriver}".
"{John} {Kennedy}" is the uncle of "{Timothy} {Shriver}".

As before, it is good form to close the dedicated session when you are finished with it.

    conn.closeDedicated()

RDFS++ Inference (example19())   Return to Top

The great promise of the semantic web is that we can use RDF metadata to combine information from multiple sources into a single, common model.  The great problem of the semantic web is that it is so difficult to recognize when two resource descriptions from different sources actually represent the same thing.  This problem arises because there is no uniform or universal way to generate URIs identifying resources.  As a result, we may create two resources, Bob and Robert, that actually represent the same person.

This problem has generated much creativity in the field. One way to approach the problem is through inference.  There are certain relationships and circumstances where an inference engine can deduce that two resource descriptions actually represent one thing, and then automatically merge the descriptions.  AllegroGraph's inference engine can be turned on or off each time you run a query against the triple store. 

In example example19(), we will create four resources: Bob, with son Bobby, and Robert with daughter Roberta. 

First we have to set up the data. We begin by generating four URIs for the new resources.

    ## Create URIs for Bob and Robert (and kids) 
    robert = conn.createURI("http://example.org/people/robert")
	roberta = conn.createURI("http://example.org/people/roberta")
	bob = conn.createURI("http://example.org/people/bob")
	bobby = conn.createURI("http://example.org/people/bobby")
The next step is to create URIs for the predicates we'll need (name and fatherOf), plus one for the Person class.
    ## create name and child predicates, and Person class.
name = conn.createURI("http://example.org/ontology/name")
fatherOf = conn.createURI("http://example.org/ontology/fatherOf")
person = conn.createURI("http://example.org/ontology/Person")
The names of the four people will be literal values.
    ## create literal values for names    
bobsName = conn.createLiteral("Bob")
bobbysName = conn.createLiteral("Bobby")
robertsName = conn.createLiteral("Robert")
robertasName = conn.createLiteral("Roberta")

Robert, Bob and the children are all instances of class Person.  It is good practice to identify all resources by an rdf:type link to a class.

    ## Robert, Bob, and children are people
conn.add(robert, RDF.TYPE, person)
conn.add(roberta, RDF.TYPE, person)
conn.add(bob, RDF.TYPE, person)
conn.add(bobby, RDF.TYPE, person)
The four people all have literal names.
    ## They all have names.
conn.add(robert, name, robertsName)
conn.add(roberta, name, robertasName)
conn.add(bob, name, bobsName)
conn.add(bobby, name, bobbysName)
Robert and Bob have links to the child resources:
    ## robert has a child
conn.add(robert, fatherOf, roberta)
## bob has a child
conn.add(bob, fatherOf, bobby)

SameAs

Now that the basic resources and relations are in place, we'll seed the triple store with a statement that "Robert is the same as Bob," using the owl:sameAs predicate.  The AllegroGraph inference engine recognizes the semantics of owl:sameAs, and automatically infers that Bob and Robert share the same attributes.  Each of them originally had one child.  When inference is turned on, however, they each have two children. 

Note that SameAs does not combine the two resources.  Instead it links each of the two resources to all of the combined children.  The red links in the image are "inferred" triples.  They have been deduced to be true, but are not actually present in the triple store.

This is the critical link that tells the inference engine to regard Bob and Robert as the same resource.

    ## Bob is the same person as Robert
conn.add(bob, OWL.SAMEAS, robert)
This is a simple getStatements() search asking for the children of Robert, with inference turned OFF. "Inference" is the fifth parameter to getStatements(), defaulting to "False".
    print "Children of Robert, inference OFF"
for s in conn.getStatements(robert, fatherOf, None, None): print s

The search returns one triple, which is the link from Robert to his direct child, Roberta.

Children of Robert, inference OFF
(<http://example.org/people/robert>, <http://example.org/ontology/fatherOf>, <http://example.org/people/roberta>)

This is a getStatements() search with inference turned ON.  This time we added the fifth parameter, True, to getStatements(). This turns on the inference engine.

    print "Children of Robert, inference ON"
for s in conn.getStatements(robert, fatherOf, None, None, True): print s
Children of Robert, inference ON
(<http://example.org/people/robert>, <http://example.org/ontology/fatherOF>, <http://example.org/people/roberta>)
(<http://example.org/people/robert>, <http://example.org/ontology/fatherOf>, <http://example.org/people/bobby>)

Note that with inference ON, Robert suddenly has two children because Bob's child has been included.  Also note that the final triple (robert fatherOf bobby) has been inferred.  The inference engine has determined that this triple logically must be true, even though it does not appear in the repository.

InverseOf

We can reuse the Robert family tree to see how the inference engine can deduce the presence of inverse relationships.

Up to this point in this tutorial, we have created new predicates simply by creating a URI and using it in the predicate position of a triple.  This time we need to create a predicate resource so we can set an attribute of that resource.  We're going to declare that the hasFather predicate is the owl:inverseOf the existing fatherOf predicate.

The first step is to remove the owl:sameAs link, because we are done with it.

    conn.remove(bob, OWL.SAMEAS, robert)

We'll need a URI for the new hasFather predicate:

    hasFather = conn.createURI("http://example.org/ontology/hasFather")

This is the line where we create a predicate resource.  It is just a triple that describes a property of the predicate.  The hasFather predicate is the inverse of the fatherOf predicate:

     conn.add(hasFather, OWL.INVERSEOF, fatherOf)

First, we'll search for hasFather triples, leaving inference OFF to show that there are no such triples in the repository:

    print "People with fathers, inference OFF"
    for s in conn.getStatements(None, hasFather, None, None): print s    
People with fathers, inference OFF

Now we'll turn inference ON.  This time, the AllegroGraph inference engine discovers two "new" hasFather triples.

    print "People with fathers, inference ON"
    for s in conn.getStatements(None, hasFather, None, None, True): print s  
People with fathers, inference ON
(<http://example.org/people/bobby>, <http://example.org/ontology/hasFather>, <http://example.org/people/bob>) (<http://example.org/people/roberta>, <http://example.org/ontology/hasFather>, <http://example.org/people/robert>)

Both of these triples are inferred by the inference engine.

SubPropertyOf

Invoking inference using the rdfs:subPropertyOf predicate lets us "combine" two predicates so they can be searched as one.  For instance, in our Robert/Bob example, we have explicit fatherOf relations.  Suppose there were other resources that used a parentOf relation instead of fatherOf.  By making fatherOf a subproperty of parentOf, we can search for parentOf triples and automatically find the fatherOf triples at the same time.

First we should remove the owl:inverseOf relation from the previous example. We don't have to, but it keeps things simple.

    ## Remove owl:inverseOf property.
conn.remove(hasFather, OWL.INVERSEOF, fatherOf)

We'll need a parentOf URI to use as the new predicate.  Then we'll add a triple saying that fatherOf is an rdfs:subPropertyOf the new predicate, parentOf:

    parentOf = conn.createURI("http://example.org/ontology/parentOf")
conn.add(fatherOf, RDFS.SUBPROPERTYOF, parentOf)

If we now search for parentOf triples with inference OFF, we won't find any. No such triples exist in the repository.

    print "People with parents, inference OFF"
for s in conn.getStatements(None, parentOf, None, None): print s
People with parents, inference OFF

With inference ON, however, AllegroGraph infers two new triples:

    print "People with parents, inference ON"
for s in conn.getStatements(None, parentOf, None, None, True): print s
People with parents, inference ON
(<http://example.org/people/bob>, <http://example.org/ontology/parentOf>, <http://example.org/people/bobby>)
(<http://example.org/people/robert>, <http://example.org/ontology/parentOf>, <http://example.org/people/roberta>)

The fact that two fatherOf triples exist means that two correponding parentOf triples must be valid.  There they are.

Before setting up the next example, we should clean up:

conn.remove(fatherOf, RDFS.SUBPROPERTYOF, parentOf) 

Domain and Range

When you declare the domain and range of a predicate, the AllegroGraph inference engine can infer the rdf:type of resources found in the subject and object positions of the triple.  For instance, in the triple <subject, fatherOf, object> we know that the subject is always an instance of class Parent, and the object is always an instance of class Child. 

In RDF-speak, we would say that the domain of the fatherOf predicate is rdf:type Parent.  The range of fatherOf is rdf:type Child. 

This lets the inference engine determine the rdf:type of every resource that participates in a fatherOf relationship. 

We'll need two new classes, Parent and Child.  Note that RDF classes are always capitalized, just as predicates are always lowercase.

    parent = conn.createURI("http://example.org/ontology/Parent")
child = conn.createURI("http://exmaple.org/ontology/Child")

Now we add two triples defining the domain and rage of the fatherOf predicate:

    conn.add(fatherOf, RDFS.DOMAIN, parent)
conn.add(fatherOf, RDFS.RANGE, child)

Now we'll search for resources of rdf:type Parent.  The inference engine supplies the appropriate triples:

    print "Who are the parents?  Inference ON."
for s in conn.getStatements(None, RDF.TYPE, parent, None, True): print s
Who are the parents?  Inference ON.
(<http://example.org/people/bob>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Parent>
(<http://example.org/people/robert>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://example.org/ontology/Parent>)

Bob and Robert are parents.  Who are the children?

    print "Who are the children?  Inference ON."
for s in conn.getStatements(None, RDF.TYPE, child, None, True): print s
Who are the children?  Inference ON.
(<http://example.org/people/bobby>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://exmaple.org/ontology/Child>)
(<http://example.org/people/roberta>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://exmaple.org/ontology/Child>)

Bobby and Roberta are the children. 

AllegroGraph provides the ability to locate resources within a geospatial coordinate system. You can set up either a flat (X,Y Cartesian) or spherical (latitude, longitude) system. The systems are two-dimensional only. (There is no Z or altitude dimension available).

The purpose of the geospatial representation is to efficiently find all entities that are located within a specific circular, rectangular or polygonal area.

Cartesian System

A Cartesian system is a flat (X,Y) plane. Locations are designated by (X,Y) pairs. At this time, AllegroGraph does not support real-world measurement units (km, miles, latitude, etc.,) in the Cartesian system.

The first example uses a Cartesian (X,Y) system that is 100 units square, and contains three people located at various points along the X = Y diagonal.

The example is in the function example20(). After establishing a connection, it begins by creating URIs for the three people.

    exns = "http://example.org/people/"
conn.setNamespace('ex', exns)
alice = conn.createURI(exns, "alice")
bob = conn.createURI(exns, "bob")
carol = conn.createURI(exns, "carol")

Then we have the connection object generate a rectangular coordinate system for us to use. A rectangular (Cartesian) system can be used to represent anything that can be plotted using (X,Y) coordinates, such as the location of transistors on a silicon chip.

    conn.createRectangularSystem(scale=10, xMax=100, yMax=100) 

The size of the coordinate system is determined by the xMin, xMax, yMin and yMax parameters. The minimum values default to zero, so this system is 0 to 100 in the X dimension, and 0 to 100 in the Y dimension.

The scale parameter influences how the coordinate data is stored and retrieved, and impacts search performance. The task is to locate the people who are within a specific region. As a rule of thumb, set the scale parameter to approximately the same value as the height (Y-axis) of your typical search region. You can be off by a factor of ten without impacting performance too badly, but if your application will search regions that are orders of magnitude different in size, you'll want to create multiple coordinate systems that are scaled for different sized search regions. In this case, our search region is about 20 units high (Y), and we have set the scale parameter to 10 units. That's close enough.

The next step is to create a "location" predicate and enter the locations of the three people.

    location = conn.createURI(exns, "location")
conn.add(alice, location, conn.createCoordinate(30,30))
conn.add(bob, location, conn.createCoordinate(40, 40))
conn.add(carol, location, conn.createCoordinate(50, 50))

Note that the coordinate pairs need to be encapsulated in a GeoCoordinate object to facilitate indexing and retrieval, using the connection object's createCoordinate() method.

At this point we have a Cartesian coordinate system containing three entities at specific (X,Y) locations. The next step is to define a search region. The first example is a "box" that is twenty units square, with the upper left corner at position (20, 20). The createBox() method requires parameters for xMin, xMax, yMin, and yMax.

    box1 = conn.createBox(20, 40, 20, 40)  

The problem is to find the people whose locations lie within this box:

Locating the matching entities is remarkably easy to do:

    for r in conn.getStatements(None, location, box1) : print r 

This retrieves all the location triples whose coordinates fall within the box1 region. Here are the resulting triples:

(<http://example.org/people/alice>, <http://example.org/people/location>, 
"+30.000000004656613+30.000000004656613"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)
(<http://example.org/people/bob>, <http://example.org/people/location>, "+39.999999990686774+39.999999990686774"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)

AllegroGraph has located Alice and Bob, as expected. Note that Bob was exactly on the corner of the search area, showing that the boundaries are inclusive.

We can also find all objects within a circle with a known center and radius. Circle1 is centered at (35, 35) and has a radius of 10 units.

    circle1 = conn.createCircle(35, 35, radius=10)   

A search within circle1 finds Alice and Bob again:

    for r in conn.getStatements(None, location, circle1) : print r 
(<http://example.org/people/alice>, <http://example.org/people/location>, 
"+30.000000004656613+30.000000004656613"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)
(<http://example.org/people/bob>, <http://example.org/people/location>, "+39.999999990686774+39.999999990686774"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)

AllegroGraph can also locate points that lie within an irregular polygon. Just tell AllegroGraph the vertices of the polygon:

    polygon1 = conn.createPolygon([(10,40), (50,10), (35,40), (50,70)]) 

When we ask which people are within polygon1, AllegroGraph finds Alice.

    for r in conn.getStatements(None, location, polygon1) : print r 
(<http://example.org/people/alice>, <http://example.org/people/location>, 
"+30.000000004656613+30.000000004656613"^^<http://franz.com/ns/allegrograph/3.0/geospatial/cartesian/0.0/100.0/0.0/100.0/1.0>)

Spherical System

A spherical coordinate system projects (X,Y) locations on a spherical surface, simulating locations on the surface of the earth. AllegroGraph supports the usual units of latitude and longitude in the spherical system. The default unit of distance is the kilometer (km). (These functions presume that the sphere is the size of the planet earth. For spherical coordinate systems of other sizes, you will have to work with the Lisp radian functions that underlie this interface.)

To establish a global coordinate system, use the connection object's createLatLongSystem() method.

    latLongGeoType = conn.createLatLongSystem(scale=5, unit='degree')

Once again, the scale parameter is an estimate of the size of a typical search area, in the longitudinal direction this time. The default unit is the degree. For this system, we expect a typical search to cover about five degrees in the east-west direction. Actual search regions may be as much as ten times larger or smaller without significantly impacting performance. If the application will use search regions that are significantly larger or smaller, then you will want to create multiple coordinate systems that have been optimized for different scales.

First we set up the resources for the entities within the spherical system. We'll need these subject URIs:

    amsterdam = conn.createURI(exns, "amsterdam")
london = conn.createURI(exns, "london")
sanfrancisto = conn.createURI(exns, "sanfrancisco")
salvador = conn.createURI(exns, "salvador")

Then we'll need a geolocation predicate to describe the lat/long coordinates of each entity.

    location = conn.createURI(exns, "geolocation")

Now we can create the entities by asserting a geolocation for each one. Note that the coordinates have to be encapsulated in coordinate objects:

    conn.add(amsterdam, location, conn.createCoordinate(52.366665, 4.883333))
conn.add(london, location, conn.createCoordinate(51.533333, -0.08333333))
conn.add(sanfrancisto, location, conn.createCoordinate(37.783333, -122.433334))
conn.add(salvador, location, conn.createCoordinate(13.783333, -88.45))

The coordinates are decimal degrees. Northern latitudes and eastern longitudes are positive.

The next step is to create a box-shaped region, so we can see what entities lie within it.

    box2 = conn.createBox( 25.0, 50.0, -130.0, -70.0) 

This region corresponds roughly to the contiguous United States.

Now we retrieve all the triples located within the search region:

    for r in conn.getStatements(None, location, box2) : print r     

AllegroGraph has located San Francisco:

(<http://example.org/people/sanfrancisco>, <http://example.org/people/geolocation>,
"+374659.49909-1222600.00212"^^<http://franz.com/ns/allegrograph/3.0/geospatial/
spherical/degrees/-180.0/180.0/-90.0/90.0/5.0>) 

This time let's search for entities within 2000 kilometers of Mexico City, which is located at 19.3994 degrees north latitude, -99.08 degrees west longitude.

    circle2 = conn.createCircle(19.3994, -99.08, 2000, unit='km')

    for r in conn.getStatements(None, location, circle2) : print r
(<http://example.org/people/salvador>, <http://example.org/people/geolocation>, 
"+134659.49939-0882700"^^<http://franz.com/ns/allegrograph/3.0/geospatial/spherical/degrees/-180.0/180.0/-90.0/90.0/5.0>)

And AllegroGraph returns the triple representing El Salvador.

In the next example, the search area is a triangle roughly enclosing the United Kingdom:

    polygon2 = conn.createPolygon([(51.0, 2.00),(60.0, -5.0),(48.0,-12.5)])

We ask AllegroGraph to find all entities within this triangle:

    for r in conn.getStatements(None, location, polygon2) : print r 
(<http://example.org/people/london>, <http://example.org/people/geolocation>, 
"+513159.49909-0000459.99970"^^<http://franz.com/ns/allegrograph/3.0/geospatial/spherical/degrees/-180.0/180.0/-90.0/90.0/5.0>)

AllegroGraph returns the location of London, but not the nearby Amsterdam.

Social Network Analysis (example21())   Return to Top

AllegroGraph includes sophisticated algorithms for social-network analysis (SNA). It can examine an RDF graph of relationships among people (or similar entities, such as businesses) and discover:

This section has multiple subsections:

Most (but not all) of AllegroGraph's SNA features can be accessed from Python. We access them in multiple ways:

Example Network

The example file for this exercise is python-lesmis.rdf. It contains resources representing 80 characters from Victor Hugo's Les Miserables, a novel about Jean Valjean's search for redemption in 17th-century Paris.

The raw data behind the model measured the strength of relationships by counting the number of book chapters where two characters were both present. The five-volume novel has 365 chapters, so it was possible to create a relationship network that had some interesting features. This is a partial display of the graph in Franz's Gruff graphical browser.

There are four possible relationships between any two characters.

(The Gruff illustrations were made from a parallel repository in which the resources were altered to display the character's name in the graph node rather than his URI. That file is called lemisNames.rdf.)

Setting Up the Example

The SNA examples are in function example21() in tutorial_examples_40.py. This exercise begins by creating a dedicated session for the SNA Prolog queries to operation within. These are the same initializing steps we have used in previous examples, plus conn.openSession() to create a dedicated connection.

    server = AllegroGraphServer(AG_HOST, AG_PORT, 'username', 'password')
catalog = server.openCatalog('scratch')
myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW)
myRepository.initialize()
conn = myRepository.getConnection()
conn.openSession() # SNA requires dedicated session.

The next step is to load the python-lesmis.rdf file.

    path1 = "./python-lesmis.rdf"
    print "Load Les Miserables triples."
    conn.addFile(path1, None, format=RDFFormat.RDFXML);

There are three predicates of interest in the Les Miserables repository. We need to create their URIs and bind them for later use. These are the knows, barely_knows, and knows_well predicates.

    # Create URIs for relationship predicates.
    lmns = "http://www.franz.com/lesmis#"
    conn.setNamespace('lm', lmns)
    knows = conn.createURI(lmns, "knows")
    barely_knows = conn.createURI(lmns, "barely_knows")
    knows_well = conn.createURI(lmns, "knows_well")

We need to bind URIs for two characters: Valjean and Bossuet. Any analysis of Les Miserables will involve Valjean. Bossuet is someone who "barely knows" Valjean, but the two characters are linked through multiple characters who are more strongly connected. We will ask AllegroGraph to find paths from Valjean to Bossuet.

    # Create URIs for some characters.
    valjean = conn.createURI(lmns, "character11")
    bossuet = conn.createURI(lmns, "character64")

Creating SNA Generators

The SNA functions use "generators" to describe the relationships we want to analyze. A generator encapsulates a list of predicates to use in social network analysis. It also describes the directions in which each predicate is interesting.

In an RDF graph, two resources are linked by a single triple, sometimes called a "resource-valued predicate." This triple has a resource URI in the subject position, and a different one in the object position. For instance:

    (<Cosette>, knows_well, <Valjean>)

This triple is a one-way link unless we tell the generator to treat it as bidirectional. This is frequently necessary in RDF data, where inverse relations are often implied but not explicitly declared as triples.

For this exercise, we will declare three generators:

"Intimates" takes a narrow view of persons who know one another quite well. "Associates" follows both strong and medium relationships. "Everyone" follows all relationships, even the weak ones. This provides three levels of resolution for our analysis.

The connection object's registerSNAGenerator() method asks for a generator name (any label), and then for one or more predicates of interest. Each predicate should be assigned to the "subjectOf" direction, the "objectOf" direction, or the "undirected" direction (both ways at once). In addition, you may specify a "generator query," which is a Prolog "select" query that lets you be more specific about the links you want to analyze.

"Intimates" follows "knows_well" links only, but it treats them as bidirectional. If Cosette knows Valjean, then we'll assume that Valjean knows Cosette.

    conn.registerSNAGenerator("intimates", subjectOf=None, objectOf=None, 
undirected=knows_well, generator_query=None)

"Associates" follows "knows" and "knows_well" links.

    conn.registerSNAGenerator("associates", subjectOf=None, objectOf=None, 
undirected=[knows, knows_well], generator_query=None)

"Everyone" follows all three relationship links.

    conn.registerSNAGenerator("everyone", subjectOf=None, objectOf=None, 
undirected=[knows, knows_well, barely_knows], generator_query=None)

 

Creating Neighbor Matrices

A generator provides a powerful and flexible tool for examining a graph, but it performs repeated queries against the repository in order to extract the subgraph appropriate to your query. If your data is static, the generator will extract the same subgraph each time you use it. It is better to run the generator once and store the results for quick retrieval.

That is the purpose of a "neighbor matrix." This is a persistent, in-memory cache of a generator's output. You can substitute the matrix for the generator in AllegroGraph's SNA functions.

The advantage of using a matrix instead of a generator is a many-fold increase in speed. This benefit is especially visible if you are searching for paths between two nodes in your graph. The exact difference in speed is difficult to estimate because there can be complex trade-offs and scaling issues to consider, but it is easy to try the experiment and observe the effect.

To create a matrix, use the connection object's registerNeighborMatrix() method. You must supply a matrix name (any symbol), the name of the generator, the URI of a resource to serve as the starting point, and a maximum depth. The idea is to place limits on the subgraph so that the search algorithms can operate in a restricted space rather than forcing them to analyze the entire repository.

In the following excerpt, we are creating three matrices to match the three generators we created. In this example, "matrix1" is the matrix for generator "intimates," and so forth.

    conn.registerNeighborMatrix("matrix1", "intimates", valjean, max_depth=2)
conn.registerNeighborMatrix("matrix2", "associates", valjean, max_depth=2)
conn.registerNeighborMatrix("matrix3", "everyone", valjean, max_depth=2)

A matrix is a static snapshot of the output of a generator. If your data is dynamic, you should regenerate the matrix at intervals.

Deleting Generators and Matrices

There is no direct way to delete individual matrices and generators, but closing the dedicated session frees all of the resources formerly used by all of the objects and structures that were created there.

Our first search will enumerate Valjean's "ego group members." This is the set of nodes (characters) that can be found by following the interesting predicates out from Valjean's node of the graph to some specified depth. We'll use the "associates" generator ("knows" and "knows_well") to specify the predicates, and we'll impose a depth limit of one link. This is the group we expect to find:

The following Python code sends a Prolog query to AllegroGraph and returns the result to Python.

    print "Valjean's ego group (using associates)."
    queryString = """
    (select (?member ?name)
      (ego-group-member !lm:character11 1 associates ?member)
      (q ?member !dc:title ?name))
      """
    tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
    result = tupleQuery.evaluate();
    print "Found %i query results" % len(result)      
for bindingSet in result: p = bindingSet.getValue("member") n = bindingSet.getValue("name") print "%s %s" % (p, n)

This is the iconic block of code that is repeated in the SNA examples, below, with minor variations in the display of bindingSet values. To save virtual trees, we'll focus more tightly on the Prolog query from this point on:

    (select (?member ?name)
      (ego-group-member !lm:character11 1 associates ?member)
      (q ?member !dc:title ?name))

In this example, ego-group-member is an AllegroGraph SNA function that has been adapted for use in Prolog queries. There is a list of such functions on the AllegroGraph documentation reference page.

The query will execute ego-group-member, using Valjean (character11) as the starting point, following the predicates described in "associates," to a depth of 1 link. It binds each matching node to ?member. Then, for each binding of ?member, the query looks for the member's dc:title triple, and binds the member's ?name. The query returns multiple results, where each result is a (?member ?name) pair. The result object is passed back to Python, where we can iterate over the results and print out their values.

This is the output of the example:

Valjean's ego group (using associates).
Found 8 query results
<http://www.franz.com/lesmis#character27> "Javert"
<http://www.franz.com/lesmis#character24> "MmeThenardier"
<http://www.franz.com/lesmis#character25> "Thenardier"
<http://www.franz.com/lesmis#character28> "Fauchelevent"
<http://www.franz.com/lesmis#character11> "Valjean"
<http://www.franz.com/lesmis#character26> "Cosette"
<http://www.franz.com/lesmis#character23> "Fantine"
<http://www.franz.com/lesmis#character55> "Marius"

If you compare this list with the Gruff-generated image of Valjean's ego group, you'll see that AllegroGraph has found all eight expected nodes. You might be surprised that Valjean is regarded as a member of his own ego group, but that is a logical result of the definition of "ego group." The ego group is the set of all nodes within a certain depth of the starting point, and certainly the starting point must be is a member of that set.

We can perform the same search using a neighbor matrix, simply by substituting "matrix2" for "associates" in the query:

    (select (?member ?name)
      (ego-group-member !lm:character11 1 matrix2 ?member)
      (q ?member !dc:title ?name))

This produces the same set of result nodes, but under the right circumstances the matrix would run a lot faster than the generator.

This variation returns Valjean's ego group as a single list.

    (select (?group)
      (ego-group !lm:character11 1 associates ?group))

The result in the Python interaction window is: 

Valjean's ego group in one list depth 1 (using associates).
Found 1 query results
[ <http://www.franz.com/lesmis#character27> <http://www.franz.com/lesmis#character25>
<http://www.franz.com/lesmis#character28> <http://www.franz.com/lesmis#character23>
<http://www.franz.com/lesmis#character26> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character24> ]

In the following examples, we explore the graph for the shortest path from Valjean to Bossuet, using the three generators to place restrictions on the quality of the path. These are the relevant paths between these two characters:

 

Our first query asks AllegroGraph to use intimates to find the shortest possible path between Valjean and Bossuet that is composed entirely of "knows_well" links. Those would be the green arrows in the diagram above. The breadth-first-search-path function asks for a start node and an end node, a generator, an optional maximum path length, and a variable to bind to the resulting path(s). Valjean is character11, and Bossuet is character64.

    (select (?path)
      (breadth-first-search-path !lm:character11 !lm:character64 intimates 10 ?path))

It is easy to examine the diagram and see that there is no such path. Valjean and Bossuet are not well-acquainted, and do not have any chain of well-acquainted mutual friends. AllegroGraph lets us know that.

Shortest breadth-first path connecting Valjean to Bossuet using intimates.
Found 1 query results
[ ]

It says there is one result, but that is actually the empty list.

This time we'll broaden the criteria. What is the shortest path from Valjean to Bossuet, using associates? We can follow either "knows_well" or "knows" links across the graph. Those are the green and the blue links in the diagram.

(select (?path)
      (breadth-first-search-path !lm:character11 !lm:character64 associates 10 ?path))

Although there are multiple such paths, there are only two that are "shortest" paths.

Shortest breadth-first path connecting Valjean to Bossuet using associates.
Found 2 query results
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character58> <http://www.franz.com/lesmis#character64> ]
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character62> <http://www.franz.com/lesmis#character64> ]

These are the paths "Valjean > Marius > Enjolras > Bossuet" and "Valjean > Marius > Courfeyrac > Bossuet." AllegroGraph returns two paths because they are of equal length. If one of the paths had been shorter, it would have returned only the short path.

Our third query asks for the shortest path from Valjean to Bossuet using everyone, which means that "barely-knows" links are permitted in addition to "knows" and "knows_well" links.

    (select (?path)
      (breadth-first-search-path !lm:character11 !lm:character64 everyone 10 ?path))

This time AllegroGraph returns a single, short path:

Shortest breadth-first path connecting Valjean to Bossuet using everyone.
Found 1 query results
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character64> ]

This is the "barely-knows" link directly from from Valjean to Bossuet.

The Prolog select query can also use depth-first-search-path() and bidirectional-search-path(). Their syntax is essentially identical to that shown above. These algorithms offer different efficiencies:

In addition, the depth-first algorithm uses less memory than the others, so a depth-first search may succeed when a breadth-first search would run out of memory.

Graph Measures

AllegroGraph provides several utility functions that measure the characteristics of a node, such as the number of connections it has to other nodes, and its importance as a communication path in a clique.

For instance, we can use the nodal-degree function to ask how many nodal neighbors Valjean has, using everyone to catalog all the nodes connected to Valjean by "knows," "barely_knows", and "knows_well" predicates. There are quite a few of them:

The nodal-degree function requires the URI of the target node (Valjean is character11), the generator, and a variable to bind the returned value to.

    print "\nHow many neighbors are around Valjean? (should be 36)."
queryString = """
(select ?neighbors
(nodal-degree !lm:character11 everyone ?neighbors))
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
for bindingSet in result:
p = bindingSet.getValue("neighbors")
print "%s" % (p)
print "%s" % p.toPython()

Note that this function returns a string that describes an integer, which in its raw form is difficult for Python to use. We convert the raw value to a Python integer using the .toPython() method that is available to all literal values in the Python API to AllegroGraph. This example prints out both the string value and the converted number.

How many neighbors are around Valjean? (should be 36).
"36"^^<http://www.w3.org/2001/XMLSchema#integer>
36

If you want to see the names of these neighbors, you can use either the ego-group-member function described earlier on this page, or the nodal-neighbors function shown below:

    print "\nWho are Valjean's neighbors? (using everyone)."
queryString = """
(select (?name)
(nodal-neighbors !lm:character11 everyone ?member)
(q ?member !dc:title ?name))
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
count = 0
for bindingSet in result:
count = count + 1
n = bindingSet.getValue("name")
print "%s. %s " % (count,n.toPython())

This example enumerates all immediate neighbors of Valjean and returns their names. This example prints a numbered list with 36 entries:

Who are Valjean's neighbors? (using everyone).
1. Isabeau
2. Fantine
3. Labarre
4. Bossuet
5. Brevet ...

Another descriptive statistic is graph-density, which measures the density of connections within a subgraph.

For instance, this is Valjean's ego group with all associates included.

Only 9 of 28 possible links are in place in this subgraph, so the graph density is 0.32. The following query asks AllegroGraph to calculate this figure for Valjean's ego group:

    (select ?density
      (ego-group !lm:character11 1 associates ?group)
      (graph-density ?group associates ?density))

We used the ego-group function to return a list of Valjean's ego-group members, bound to the variable ?group, and then we used ?group to feed that subgraph to the graph-density function. The return value, ?density, came back as a string describing a float, and had to be converted to a Python float using .toPython().

Graph density of Valjean's ego group? (using associates).
"3.2142857e-1"^^<http://www.w3.org/2001/XMLSchema#double>
3.2142857e-1

Cliques

A "clique" is a subgraph where every node is connected to every other node by predicates specified in some generator. AllegroGraph, using everyone ("knows," "knows_well," and "barely_knows"), found that Valjean participates in 239 cliques!

It is counterintuitive that a "clique" should be composed mainly of people who "barely_know" each other, so let's try the same experiment using "associates," which restricts the cliques to people Valjean "knows" or "knows_well." In this case, AllegroGraph returns two cliques. One contains Valjean, Cosette, and Marius. The other contains Valjean and the Thenardiers.

This is the query that finds Valjean's "associates" cliques:

    (select (?clique)
      (clique !lm:character11 associates ?clique))

AllegroGraph returns two cliques:

Valjean's cliques? Should be two (using associates).
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character25>
<http://www.franz.com/lesmis#character24> ]
[ <http://www.franz.com/lesmis#character11> <http://www.franz.com/lesmis#character55>
<http://www.franz.com/lesmis#character26> ]

The first list is the clique with Marius and Cosette. The second one represents the Thernardier clique.

Actor Centrality

AllegroGraph lets us measure the relative importance of a node in a subgraph using the actor-degree-centrality() function. For instance, it should be obvious that Valjean is very "central" to his own ego group (depth of one link), because he is linked directly to all other links in the subgraph. In that case he is linked to 7 of 7 possible nodes, and his actor-degree-centrality value is 7/7 = 1.

However, we can regenerate Valjean's ego group using a depth of 2. This adds three nodes that are not directly connected to Valjean. How "central" is he then?

In this subgraph, Valjean's actor-degree-centrality is 0.70, meaning that he is connected to 70% of the nodes in the subgraph.

This example asks AllegroGraph to generate the expanded ego group, and then to measure Valjean's actor-degree-centrality:

    (select (?centrality)
      (ego-group !lm:character11 2 associates ?group)
      (actor-degree-centrality !lm:character11 ?group associates ?centrality))

Note that we asked ego-group() to explore to a depth of two links, and then fed its result (?group) to actor-degree-centrality(). This is the output:

Valjean's centrality to his ego group at depth 2 (using associates).
"7.0e-1"^^
0.7

This confirms our expectation that Valjean's actor-degree-centrality should be 0.70 in this circumstance.

We can also measure actor centrality by calculating the average path length from a given node to the other nodes of the subgraph. This is called actor-closeness-centrality. For instance, we can calculate the average path length from Valjean to the ten nodes of his ego group (using associates and depth 2). Then we take the inverse of the average, so that bigger values will be "more central."

The actor-closeness-centrality for Valjean is 0.769. For Marius it is 0.60, showing that Valjean is more central and important to the group than is Marius.

This example calculates Valjean's actor-closeness-centrality for the associates ego group of depth 2.

    (select (?centrality)
      (ego-group !lm:character11 2 associates ?group)
      (actor-closeness-centrality !lm:character11 ?group associates ?centrality))
Valjean's actor-closeness-centrality to his ego group at depth 2 (using associates).
"7.692308e-1"^^<http://www.w3.org/2001/XMLSchema#double>
0.7692308

That is the expected value of 0.769.

Another approach to centrality is to count the number of information paths that are "controlled" by a specific node. This is called actor-betweenness-centrality. For instance, there are 45 possible "shortest paths" between pairs of nodes in Valjean's associates depth-2 ego group. Valjean can act as an information valve, potentially cutting off communication on 34 of these 45 paths. Therefore, he controls 75% of the communication in the group.

This example calculates Valjean's actor-betweenness-centrality:

    (select (?centrality)
      (ego-group !lm:character11 2 associates ?group)
      (actor-betweenness-centrality !lm:character11 ?group associates ?centrality))
Valjean's actor-betweenness-centrality to his ego group at depth 2 (using associates).
"7.5555557e-1"^^<http://www.w3.org/2001/XMLSchema#double>
7.5555557

That's the expected result of 0.755.

Group Centrality

Group-centrality measures express the "cohesiveness" of a group. There are three group-centrality measures in AllegroGraph: group-degree-centrality(), group-closeness-centrality(), and group-betweenness-centrality().

To demonstrate these measures, we'll use Valjean's ego group, first at radius 1 and then at radius 2. As you recall, the smaller ego group is radially symmetrical, but the larger one is quite lop-sided. That makes the smaller group "more cohesive" than the larger one.

Group-degree-centrality() measures group cohesion by finding the maximum actor centrality in the group, summing the difference between this and each other actor's degree centrality, and then normalizing. It ranges from 0 (when all actors have equal degree) to 1 (when one actor is connected to every other and no other actors have connections.

The prolog query takes this form:

 (select (?centrality)
   (ego-group !lm:character11 1 associates ?group)
   (group-degree-centrality ?group associates ?centrality))

The query first generates Valjean's (character11) ego group at radius 1, and binds that list of characters to ?group. Then it calls group-degree-centrality() on the group and returns the answer as ?centrality.

The group-degree-centrality for Valjean's radius-1 ego group is 0.129. When we expand to radius 2, the group-degree-centrality drops to 0.056. The larger group is less cohesive than the smaller one.

The following examples were all generated from queries that strongly resemble the one above.

Group-closeness-centrality() is measured by first finding the actor whose `closeness-centrality`
is maximized and then summing the difference between this maximum value and the actor-closeness-centrality of all other actors. This value is then normalized so that it ranges between 0 and 1.

The group-closeness-centrality of Valjean's smaller ego group is 0.073. The expanded ego group has a group-closeness-centrality of 0.032. Again, the smaller group is more cohesive.

Group-betweenness-centrality() is measured by first finding the actor whose actor-betweenness-centrality
is maximized and then summing the difference between this maximum value and the actor-betweenness-centrality of all other actors. This value is then normalized so that it ranges between 0 and 1.

Valjean's smaller ego group has a group-betweenness-centrality of 0.904. The value for the larger ego group is 0.704. Even by this measure, the larger group is less cohesive.

Transaction Control (example22())   Return to Top

Most of the time, multiple AllegroGraph users pool triples in a common session. Their data may be segregated into multiple contexts (subgraphs) for convenience, but in fact any part of this data may be retrieved by any of the users.

There are situations where a little more security is required. You may open a "dedicated session" and populate it with triples that the other users cannot view. These triples may be searched and manipulated in the usual ways, but do not become visible to other users until you "commit" them to the common session.

In order to open a dedicated session, the user must have "session" privileges with AllegroGraph Server. This is an elevated level of privilege. Therefore certain features that also require an elevated level of trust are required to use dedicated sessions. For instance, Prolog rules and Social Network Analysis functions cannot be run in the common session. You must open a dedicated session in order to use them.

Session management makes use of a transaction model. Changes in your dedicated session remain private until you are ready to commit them to the common session.

"Commit" means to make the triples in the dedicated session visible in the common session. At the time of the commit, triples in the common session are also made visible to queries in the dedicated session. The two sessions are "synched up" by the commit.

"Rollback" means to discard the contents of the dedicated session. It also makes all common-session triples visible in the dedicated session.

"Closing" the dedicated session deletes all uncommitted triples, and all rules, generators and matrices that were created in the dedicated session. Rules, generators and matrices cannot be committed. They persist as long as the dedicated session persists, and are deleted when the dedicated session is closed.

Example example22() performs some simple data manipulations on a dedicated session to demonstrate the rollback and commit features. It begins by creating a connection to a repository (the common session), and a second connection to a dedicated session. To do that, all you need is a second connection object, and invoke its openSession() method.

    server = AllegroGraphServer(AG_HOST, AG_PORT, 'test', 'xyzzy')
catalog = server.openCatalog('scratch')
myRepository = catalog.getRepository(AG_REPOSITORY, Repository.RENEW)
myRepository.initialize()
common = myRepository.getConnection()
dedicated = myRepository.getConnection()
dedicated.openSession() # open dedicated session

We'll reuse the Kennedy and Les Miserables data. The Les Miserables data goes in the common session, and the Kennedy data goes in the dedicated session.

    path1 = "./python-kennedy.ntriples"
path2 = "./python-lesmis.rdf"
baseURI = "http://example.org/example/local"
## read kennedy triples into the dedicated session:
print "Load 1214 python-kennedy.ntriples into dedicated session."
dedicated.add(path1, base=baseURI, format=RDFFormat.NTRIPLES, contexts=None)
## read lesmis triples into the common session:
print "Load 916 lesmis triples into the common session."
common.addFile(path2, baseURI, format=RDFFormat.RDFXML, context=None);

The two sessions should now have independent content. When we look in the common session we should see only Les Miserables triples. The dedicated session could contain only Kennedy triples. We set up a series of simple tests similar to this one:

    print "\nUsing getStatements() on common session; should find Valjean:"
    Valjean = common.createLiteral("Valjean")
    statements = common.getStatements(None, None, Valjean, 'null', limit=1)
    print "Number of results: %s" % len(statements)
    for s in statements:
    print s

This test looks for our friend Valjean in the common session. He should be there. This is the output:

Using getStatements() on common session; should find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

However, there should not be anyone in the common session named "Kennedy." The code of the test is almost identical to that shown above, so we'll skip straight to the output.

Using getStatements() on common session; should not find Kennedy:
   Number of results: 0

There should be a Kennedy (at least one) visible in the dedicated session:

Using getStatements() on dedicated session; should find Kennedys:
   Number of results: 1
 (<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#last-name>, "Kennedy") 

And finally, we should not see Valjean in the dedicated session:

Using getStatements() on dedicated session; should not find Valjean:
   Number of results: 0  

The next step in the demonstration is to roll back the data in the dedicated session. This will make the Kennedy data disappear. It will also make the Les Miserables data visible in both sessions. We'll perform the same four tests, with slightly different expectations.

First we roll back the transaction:

    dedicated.rollback() 

Valjean is still visible in the common session:

Using getStatements() on common session; should find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

There are still no Kennedys in the common session:

Using getStatements() on common session; should find Kennedys:
Number of results: 0

There should be no Kennedys visible in the dedicated session:

Using getStatements() on dedicated session; should find Kennedys:
   Number of results: 0

And finally, we should suddenly see Valjean in the dedicated session:

Using getStatements() on dedicated session; should not find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

The rollback has succeeded in deleting the uncommitted triples from the dedicated session. It has also refreshed or resynched the dedicated session with the common session.

To set up the next test, we have to reload the Kennedy triples. Then we'll perform a commit.

    print "\nReload 1214 python-kennedy.ntriples into dedicated session."
dedicated.add(path1, base=baseURI, format=RDFFormat.NTRIPLES, contexts=None)
dedicated.commit()

This should make both types of triples visible in both sessions. Here are the four tests:

Using getStatements() on common session; should find Valjean:
   Number of results: 1
 (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")
Using getStatements() on common session; should find Kennedys:
   Number of results: 1
   (<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#last-name>, "Kennedy")
Using getStatements() on dedicated session; should find Kennedys:
   Number of results: 1
   (<http://www.franz.com/simple#person1>, <http://www.franz.com/simple#last-name>, "Kennedy")
Using getStatements() on dedicated session; should find Valjean:
   Number of results: 1
   (<http://www.franz.com/lesmis#character11>, <http://purl.org/dc/elements/1.1/title>, "Valjean")

The Les Miserables triples are visible in both sessions. So too are the Kennedy triples.

Remember that when you are done with a session or connection, it is appropriate to close it:

    dedicated.closeSession()
dedicated.close()
common.close()

This recovers resources that would otherwise remain bound.

Duplicate Triples (example23())    Return to Top

Most people find it annoying when a query returns multiple copies of the same information. This can happen in more than one way. The strategies for reducing this behavior differ depending on your situation. We have collected those mechanisms here under the banner "Duplicate Triples."

Tighten Up Your Queries

Triple patterns sometimes act in unexpected ways, resulting in "too many" matches.

Example23() revives the Kennedy family tree. It loads the python-kennedy.ntriples file, resulting in 1214 triples.

This example focuses on the three children of Ted Kennedy (person17). A simple getStatements() query shows us that all three are, in fact, present. They are person72, person74, and person76.

Using getStatements() find children of Ted Kennedy: three children.
Number of results: 3
(<http://www.franz.com/simple#person17>, <http://www.franz.com/simple#has-child>, <http://www.franz.com/simple#person72>)
(<http://www.franz.com/simple#person17>, <http://www.franz.com/simple#has-child>, <http://www.franz.com/simple#person74>)
(<http://www.franz.com/simple#person17>, <http://www.franz.com/simple#has-child>, <http://www.franz.com/simple#person76>)

Let's imagine that for some bureaucratic or legal reason we need to retrieve any two of the senator's three children from the triple store. We might begin with a SPARQL query like this one:

    SELECT ?o1 ?o2 
WHERE {kdy:person17 kdy:has-child ?o1 .
kdy:person17 kdy:has-child ?o2 .}

Since there are only three ways to retrieve a pair from a pool of three, we might be startled to get nine answers:

SPARQL matches for two children of Ted Kennedy, inept pattern.
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person76> and <http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person76> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person76> and <http://www.franz.com/simple#person76>

Three of these matches involve the same triple matching both patterns, claiming that each person is a sibling of himself. The other six are permutations of the correct answers, in which we first discover that person72 is a sibling of person74, and then subsequently discover that person74 is a sibling of person72. We didn't need to receive that information twice.

Let's eliminate the useless duplications with a simple trick:

    SELECT ?o1 ?o2 
WHERE {kdy:person17 kdy:has-child ?o1 .
kdy:person17 kdy:has-child ?o2 .
filter (?o1 < ?o2)}

The filter insures that the two object variables cannot be bound to the same value, eliminating the sibling-of-self issue, and also that the lower-numbered person must be bound to ?o1, which eliminates the duplication issue. Now we have the expected three results:

SPARQL matches for two children of Ted Kennedy, better pattern.
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person72> and <http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person74> and <http://www.franz.com/simple#person76>

Our task, however, is to pluck "any two" of the children from the triple store and then stop. That's easy:

    SELECT ?o1 ?o2 
WHERE {kdy:person17 kdy:has-child ?o1 .
kdy:person17 kdy:has-child ?o2 .
filter (?o1 < ?o2)}
LIMIT 1

This query returns one result that is guaranteed to reference two different children.

The moral of this example is that triple patterns often return more information than you expected, and need to be rewritten to make them less exuberant. You may be able to reduce the incidence of duplicate results simply by writing better queries.

Don't Create Duplicate Triples

Now we'll consider the situation where we have multiple copies of the same information in the triple store. This is the classic "duplicate triples" situation.

You don't get duplicate triples by accident. They occur only when you have loaded the same information into the triple store more than once.

This is easy to demonstrate. Let's load the Kennedy graph into an empty triple store:

Load 1214 python-kennedy.ntriples.
After loading, there are:
  1214 kennedy triples in context 'null';

Then just load the same file again:

Reload 1214 python-kennedy.ntriples.
After loading, there are:
  2428 kennedy triples in context 'null';

Now there are two copies of every Kennedy triple. If you add the same triple multiple times, you will get multiple copies of it.

In practice this means that your user-interface forms have to be smart about saving only the fields that have been changed. At the same time, you have to delete the triples that the change made obsolete (using conn.remove(), demonstrated in example2().)

Filter Out Duplicate Results

In practice, our advice "don't create duplicate triples" may be difficult to follow. Some data feeds contain duplicated information and there is no convenient or efficient way to filter it out. So now you have duplicated data in the system. How much trouble might that cause?

This simple query should return three matches, one for each of Ted Kennedy's three children:

    SELECT ?o WHERE {kdy:person17 kdy:has-child ?o}

However, because each of the expected triples has a duplicate, we get six answers instead of three:

SPARQL matches for children of Ted Kennedy.
<http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person74>
<http://www.franz.com/simple#person76>

That's a nuisance, but SPARQL provides an easy way to wipe out the duplicate answers:

    SELECT DISTINCT ?o WHERE {kdy:person17 kdy:has-child ?o}

The DISTINCT operator does not remove duplicate triples, but it detects and eliminates duplicate variable bindings in the query's output. The duplicate triples are still there in the triple store, but we don't see them in the output:

SPARQL DISTINCT matches for children of Ted Kennedy.
<http://www.franz.com/simple#person72>
<http://www.franz.com/simple#person76>
<http://www.franz.com/simple#person74>

In some situations this is all you have to do to tame the "duplicate triple" problem.

Filtering Duplicate Triples while Loading

AllegroGraph could check each incoming triple to see if that information is already present in the triple store, but this process would be very time-consuming, slowing down our extremely-fast load times by orders of magnitude. Loading speed is important to almost everyone, so AllegroGraph does not filter duplicates during the load.

If loading speed isn't critical in your application, you can add triples one at a time while checking for duplicates as shown here:

    if conn.getStatements(newParent, hasChild, newChild):
print "Did not add new triple."
else:
conn.add(newParent, hasChild, newChild)
print "Added new triple."

If getStatements() return any value, then the new triple is already present in the store. If not, then we can add it to the store knowing that it will be unique.

When you run this example twice it adds a triple the first time, but not the second time:

Test before adding triple, first trial: 
Added new triple.  


Test before adding triple, second trial: 
Did not add new triple.

Automatic Duplicate Removal

AllegroGraph does, in fact, filter out duplicate triples. This does not occur until you have loaded 200,000 triples. It occurs again after the next 200,000 triples, and so on. This is an side-effect of managing and refining the triple indexes.

AllegroGraph builds its triple indexes in memory until the index reaches 100,000 triples. An index of 100,000 triples is called a "chunk," and may or may not contain some duplicate triples. When the in-memory index reaches this size, it is written to disk. AllegroGraph then wipes the in-memory chunk and starts building a new one. If the user queries the triple store at this point, AllegroGraph consults both the in-memory chunk and the chunk on disk to find all matching triples.

When another 100,000 triples have been loaded, a second "chunk" is written to disk. AllegroGraph begins to build a third in-memory "chunk" at that point. Any of these three chunks might contain triples that are duplicated within that chunk, or that duplicate triples that are indexed in the other chunks.

When searching for triples, it isn't very efficient to consult two separate chunks on disk. Instead, a background process takes these two chunks and merges them into a single chunk that contains the indexes for 200,000 triples. The merging process gives us an opportunity to eliminate duplicate triples very efficiently. The merged double-chunk index is guaranteed to have only one copy of each triple. Merging also removes all "deleted" triples from the index, which recovers some disk space.

At this point the merged index is duplicate-free, but the in-memory index might still have duplicates in it.

AllegroGraph actively eliminates duplicate triples, but the mechanism is never applied to the entire triple store at once. Newer data in chunks that have not been merged can still contain duplicates. It is also possible that a merged chunk might contain triples that are also present in another merged chunk. As the size of the triple store grows, AllegroGraph gradually merges larger and larger chunks into duplicate-free super-chunks. This process is ongoing, and gradually weeds the duplicates out of the system.

Search-and-Destroy Queries

It is easy to write a query that finds all duplicate triples and deletes them, but it is the kind of query that becomes exponentially expensive as the triple store grows. If you try to run such a query your system will hang. A generic search-and-destroy query is not practical in real-world applications.

However, it can be practical to expunge duplicate triples from a specific resource, particularly if you can also identify a predicate that is prone to duplicates. A highly-specific query can be very efficient. For instance, you might run this query to "weed" the duplicates from a resource that you are about to reference in another query.

The classic search-and-destroy query uses Prolog SELECT, because it needs to reach into Lisp briefly. Prolog queries allow the execution of arbitrary code within AllegroGraph. Remember that the AllegroGraph user needs special privileges to run Prolog queries for this very reason. Queries do not normally have "side effects" on the triple store, but that is exactly what we need in this case.

You may recall that AllegroGraph's "triples" are actually "quints." In addition to the subject, predicate and object of a typical triple, the quints contains a fourth field to identify the subgraph (if any), and a fifth field which contains the triple ID value. The triple ID is an internal value not normally exposed to the AllegroGraph users. We can use this ID to target a triple and delete it from the store.

Here is a highly-specific search-and-destroy query. It deletes duplicate triples from the kdy:has-child attributes of the kdy:person17 resource:

    queryString = """
(select (?id2)
(q !kdy:person17 !kdy:has-child ?o ?g ?id1)
(q !kdy:person17 !kdy:has-child ?o ?g ?id2)
(lispp (when (upi< ?id1 ?id2)
(delete-triple (upi->value ?id2)))))
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.PROLOG, queryString)
result = tupleQuery.evaluate();
for bindingSet in result:
id2 = bindingSet.getValue("id2")
print "Deleted triple number: %s" % (id2)

This query reaches for any two has-child triples belonging to person17. Both triple patterns use the same ?o variable, insuring that both triples refer to the same child. They have the same ?g variable, too, insuring that both triples are part of the same subgraph. Then the ID of one triple is bound to ?id1, and the ID of the other triple is bound to ?id2. (Note that it is perfectly possible for the same triple to match both patterns.)

The two triple patterns are followed by a lispp test. This is how you set up a "filter" query in Prolog. The query will succeed (and return a result) in any circumstance where two triples are found (not just the same triple twice). We achieve this by using the upi< predicate to see if the two IDs are different. If the IDs are not the same, the test succeeds. The upi< predicate also insures that the larger ID number (the newer triple) is bound to ?id2.

The test also has a side effect. When the test succeeds, it calls the Lisp function delete-triple on ?id2, which is the newer of the two triples. The query reports the ID of the deleted triple in the query results.

To demonstrate this, we'll first run a query asking for all children of person17, reporting back the object value and the triple ID. There should be three children, but we see six responses due to duplicate triples.

Prolog SELECT matches for children of Ted Kennedy, before triple deletion.
<http://www.franz.com/simple#person72> "1547"
<http://www.franz.com/simple#person74> "1549"
<http://www.franz.com/simple#person76> "1551"
<http://www.franz.com/simple#person72> "333"
<http://www.franz.com/simple#person74> "335"
<http://www.franz.com/simple#person76> "337"

Triples number 333, 335, and 337 are the original triples. Numbers 1547, 1549, and 1551 are the duplicates.

Now we run the search-and-destroy query. This is the output:

Prolog SELECT matches for children of Ted Kennedy, deleting duplicate triples.
Deleted triple number: "1547"
Deleted triple number: "1549"
Deleted triple number: "1551"

When we run the first query again, the duplicates are gone:

Prolog SELECT matches for children of Ted Kennedy, after triple deletion.
<http://www.franz.com/simple#person72> "333"
<http://www.franz.com/simple#person74> "335"
<http://www.franz.com/simple#person76> "337"

Before closing, we will reiterate the two warnings. One, a search-and-destroy query must be as specific as possible or it will overwhelm your system with combinatoric matches. Two, a Prolog query that executes Lisp code has the power to do anything... anything good, and anything bad. There is no safety net, so use this feature with caution.

Duplicates in Federated Stores

A federated store is a single connection to multiple AllegroGraph repositories, possibly on multiple servers at multiple sites. Such systems can have duplicate triples, but deleting them is unwise.

Presumably any existing repository has a purpose of its own, and has been federated with other respositories to serve some combined purpose. It would be a mistake to delete triples from one repository simply because they also occur elsewhere in the federated store. This could damage the integrity of the individual repositories when they are used in stand-alone mode. By similar reasoning, it is a mistake to add triples to a federated repository. Which repository gets the new triples?

For all of these reasons, federated stores are "read only" in AllegroGraph. Attempting to add or delete triples in a federated system result in error messages.

Federation changes the semantics of "duplicate triples." Be sure to take this into account in your project design.

February 11, 2010 4.0 M3 release