AllegroGraph 2.2 Java Tutorial and Reference

Table of Contents

Introduction

The big picture

Accessing AllegroGraph from Java

Preparing the Triple Store

Testing the interface

Stopping the AllegroGraph Server Application

Tutorial

Connecting Java to the Triple Store

Buffered Operations

Simple Database Operations

Optimization notes

The OpenRDF Model

More complex queries using Prolog

More complex queries using SPARQL

How to use text indexing from Java

Reference

The AllegroGraph server application

Setting the location of the AllegroGraph Server application

AllegroGraph Java sources

Introduction

This document introduces AllegroGraph. It assumes that you are somewhat familiar with RDF (Resource Description Framework), RDFS (RDF Schema), and OWL (Web Ontology Language). If you are not very familiar with RDF, RDFS, and OWL, we suggest that you start with A Semantic Web Primer by Grigoris Antoniou and Frank van Harmelen (2001, Cambridge MA, MIT press; available, e.g. from www.amazon.com). It is a very gentle introduction to these new technologies. For a quick introduction, see these Wikipedia entries: OWL, RDF, and RDFS.

The big picture

AllegroGraph is a pure triple store that you can use for storing RDFS/OWL triples but also as an on-disk graph database.

Accessing AllegroGraph from Java

The Java API to the AllegroGraph triple store allows Java applications to access and manipulate triple store databases.

This tutorial introduces some of the Java Allegrograph API objects and methods in simple examples. The full documentation of the Java API is here.

Preparing the Triple Store

The Java API to the AllegroGraph Triple Store is a client-server implementation where the Java application is the client. In the Java-only edition of AllegroGraph, there are two distinct modes of operation possible:

Starting the AllegroGraph server from a Java application

In this mode of operation the Java application calls the startServer() method in the AllegroGraphConnection class. The only preparation needed for this mode of operation is to know where the AllegroGraph server executable was installed.

The Java application can specify the location of the server executable explicitly with a call to setDefaultCommand() or setCommand().

The Java application may also be started with a property setting for the property com.franz.ag.exec.

The most convenient mode is to set a user or system Java Preferences value with the utility in the main() method of the AllegroGraphConnection class. See the section Setting the location of the AllegroGraph Server application for full details. A Preferences setting persists from one session to the next and needs to be set only once in an installation.

Starting the AllegroGraph server as a separate application

The AllegroGraph server application is started from its installation location. The startup parameters specify the port numbers. The Java application must use these same parameters to connect to the server.

The section The AllegroGraph server application describes the AllegroGraph server application in detail.

Testing the interface

We include a sample program, AGExample.java, in the AllegroGraph distribution. This program may be used to verify the installation and to demonstrate that the connection between Java and AllegroGraph is working. Furthermore, the source code provides examples of using Java AllegroGraph. Please take a look at AGExample.java.

Before you run the client Java program, it must be informed about the location of one important file: com.franz.agraph-2-2-5.jar resides in the AllegroGraph installation directory.

The full pathname to this file can be included in the Java classpath, or the files may be copied to a more convenient location. When using Eclipse, it may be specified as a library in the project properties.

Testing the interface on Windows

The first step is to start the AllegroGraph server by selecting the AllegroGraph server item on the AllegroGraph Start Menu entry (or double-click on AllegroGraphJavaServer in the AllegroGraph installation directory).

The second step is to open a command window in the folder where AllegroGraph was installed.

At this point, the following command will start the sample application, but it will terminate immediately with an error message because the program needs the location of the database work area:

java -cp .;com.franz.agraph-2-2-5.jar AGExample 

The full command line parameters of the sample program are described in a comment in the program source. The most important argument is "-d", a required argument which specifies an existing directory to hold the database files:

java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst 

Other command examples:

Load the the Wilbur example OWL ontology:

java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -r wilburwine.rdf 

The above command assumes you are in the AllegroGraph installation directory, as wilburwine.rdf is distributed with AllegroGraph.

Load the ntriples version of Wilbur OWL ontology:

java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -t wilburwine.ntriples 

NOTE: when a large data file is specified, there may be a delay before the sample program shows any output.

Testing the interface on Linux and Unix

The first step is to open a shell in the AllegroGraph installation directory.

The second step is to start the AllegroGraphJavaServer executable. You may want to put it into the background and redirect the output from the program to a file.

At this point, the following command will start the sample application, but it will terminate immediately with an error message because the program needs the location of the database work area:

java -cp '.:com.franz.agraph-2-2-5.jar' AGExample 

The full command line parameters of the sample program are described in a comment in the program source. The most important argument is "-d", a required argument which specifies :

java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst 

Other command examples:

Load the the Wilbur example OWL ontology:

java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -r wilburwine.rdf 

The above command assumes you are in the AllegroGraph installation directory, as wilburwine.rdf is distributed with AllegroGraph.

Load the ntriples version of Wilbur OWL ontology:

java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -t wilburwine.ntriples 

NOTE: when a large data file is specified, there may be a delay before the sample program shows any output.

More advanced uses of the sample application

The sample application tests several other command-line arguments that modify the behavior of the application. These arguments are described in comments in the source code.

The application can also start the server if the "-x" argument is added to the command.

Stopping the AllegroGraph Server Application

Once the AllegroGraph server application is running, it can be terminated in several ways:

We supply a small Java application that stops the AllegroGraph server. The application is run with a command such as the following.

On Windows:

java -cp .;com.franz.agraph-2-2-5.jar AGStop [-p port] [-h host] 

On Unix:

java -cp '.:com.franz.agraph-2-2-5.jar' AGStop [-p port] [-h host] 

Tutorial

Connecting Java to the Triple Store

The first thing you might have noticed reading through the test program, AGExample.java, is that each Java application must connect to the server before any part of the API can be used. Connect to the server by creating a new instance of the class AllegroGraphConnection. The AllegroGraphConnection class implements methods open(), create(), and others that open databases and return instances of the class AllegroGraph. Each open database is represented by a new instance of the class AllegroGraph.

If the Java application disconnects from the server, all AllegroGraph instances become invalid and must be discarded.

Buffered Operations

The communication between the Java application and the AllegroGraph server takes place through a socket. In order to minimize the delays that may be imposed by operating system overheads, it is good practice to operate on many data items in each interaction between the Java client application and the server.

We facilitate this buffering by providing array operations for most of the database accessors. The array operations create or retrieve many database elements in a single interaction and therefore are much more time efficient.

Simple Database Operations

Opening a database

A database is opened by creating an AllegroGraph instance.

AllegroGraphConnection sv = new AllegroGraphConnection();  
sv.enable();  
AllegroGraph ts = sv.create("test, "/s/ja/temp/"); 

The database is closed with the closeDatabase() method. Once the database is closed, the AllegroGraph instance should be discarded since it cannot be used for further interactions.

To re-open a database, create a new AllegroGraph instance.

Creating triples

Triples can be created one at a time by naming the components with strings in ntriples syntax.

ts.addStatement("<http://www.franz.com/things#Dog>",  
		"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",  
		"<http://www.w3.org/2002/07/owl#Class>"); 

The application can also save the details of the newly created triple by creating a new Triple instance with the newTriple() method.

Triple tr2 = ts.newTriple(  
               "<http://www.franz.com/things#Dog>",  
               "<http://www.w3.org/2000/01/rdf-schema#subClassOf>",  
               "<http://www.franz.com/things#Mammal>"); 

When many triples are created, it is more efficient to buffer the operation by grouping the triple components into arrays. The following statement creates three triples from corresponding elements of the arrays.

ts.addStatements(  
new String[]{  
    "<http://www.franz.com/things#Cat>",  
    "<http://www.franz.com/things#Giraffe>",  
    "<http://www.franz.com/things#Lion>" },  
new String[]{  
    "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",  
    "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",  
    "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>" },  
new String[]{  
    "<http://www.w3.org/2002/07/owl#Class>",  
    "<http://www.w3.org/2002/07/owl#Class>",  
    "<http://www.w3.org/2002/07/owl#Class>" }  
);                             

When an array consists of identical elements, it can be shortened to a single element. The following statement creates three triples where the predicate and object components are identical.

ts.addStatements(  
    new String[]{  
        "<http://www.franz.com/things#Cat>",  
        "<http://www.franz.com/things#Giraffe>",  
        "<http://www.franz.com/things#Lion>" },  
    new String[]{"<http://www.w3.org/2000/01/rdf-schema#subClassOf>"},  
    new String[]{"<http://www.franz.com/things#Mammal>"}  
); 

Querying for triples

Triples are retrieved from the database with a Cursor instance. The Cursor instance can iterate through all the triples in the search result. The following statement will retrieve the four triples about subclasses of the "Mammal" class created earlier.

String wild = null;  
Cursor cc = ts.getStatements(  
    wild,  
    "<http://www.w3.org/2000/01/rdf-schema#subClassOf>",  
    "<http://www.franz.com/things#Mammal>" ); 

When a Cursor instance is created, it is not positioned at a result. The step() method advances the Cursor instance to the first or next result. When a Cursor has been advanced, the returned value is true. When a Cursor is exhausted, the returned value is false.

if ( cc.step() ) Triple tr = cc.getTriple(); 

When the Cursor is positioned at a result, we can retrieve the component of interest without creating a Triple instance.

Value s = cc.getSubject(); 

We can also retrieve several results in one operation. The following statement retrieves an array of at most 6 elements:

Triple[] trc = cc.step(6);  
int n = trc.length; 

Optimization notes

Maximum Index Chunk Size parameter

This parameter, settable and gettable by the setChunkSize() and getChunkSize() methods, controls the maximum number of records that are sorted at a time during index merging. (Indexing happens by calling indexAll() or indexTriples() methods.)

The initial value of this parameter is believed to be good for machines with 1-2GB of RAM. If your computer has significantly more memory than this, you might improve indexing performance by using larger values (e.g., doubling or more the initial value).

Expected Unique Resources parameter

This parameter, settable and gettable by the setDefaultExpectedResources() and getDefaultExpectedResources() methods, controls the default value for the expected number of unique resources in a new triple store. This number is the expected number of distinct URIs and literals in the triple store database. If the number is too small, performance may suffer during database creation. A rough rule of thumb is to specify a number that is one third of the number of triples.

The OpenRDF Model

We implement most of the interfaces in the OpenRDF model defined at http://openrdf.org/.

The current implementation does not implement the interface Graph.

More complex queries using Prolog

AllegroGraph includes a Prolog implementation that may be used to search a database. The select() and selectValues() methods allow searches that return triples or database nodes and literals.

ValueObject[][] v =  
    ts.selectValues  
        ("(?x ?y ?z) " +  
         "  (and (q ?x " +  
         "      !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +  
         "      ?y) " +  
         "   (q ?y " +  
         "      !http://www.w3.org/2000/01/rdf-schema#subClassOf " +  
         "   ?z))",  
        new Object[0], ""); 

The result v will be an array of sub-arrays. Each sub-array represents one successful match of the query. Each sub-array will be of length 3: the first element in the sub-array will be the binding of the variable ?x, the second ?y and the third ?z.

It may also be desirable to substitute values from the Java application into the query string. This can be done by simply concatenating the required strings, but we do allow a more convenient option.

URI typePred = ts.addURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type");  
URI classPred = ts.addURI("http://www.w3.org/2000/01/rdf-schema#subClassOf>");  
ValueObject[][] w = ts.selectValues  
		       ("(?x ?y ?z) (and (q ?x ?a ?y) (q ?y ?b ?z))"  
			     new Object[]{ typePred, classPred },  
			            "?a ?b"); 

This query returns the same result as the previous example, but we have substituted values from the program into the query.

A query can return a mixture of nodes, literals and triples. The query

URI typePred = ts.addURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type");  
URI classPred = ts.addURI("http://www.w3.org/2000/01/rdf-schema#subClassOf>");  
ValueObject[][] w = ts.selectValues  
		       ("(?x ?y ?z ?t ?u) (and (q ?x ?a ?y ? ?t) (q ?y ?b ?z ? ?u))"  
			     new Object[]{ typePred, classPred },  
			            "?a ?b"); 

returns an array where each sub-array is of length 5. The fourth and fifth elements in the sub-array are the triples that satisfied the query. The lone question marks in the pattern skip the graph position of each triple to allow unification with the triple ids.

If all the results of interest are triples, a select() method can be used to return a Cursor instance. The Cursor instance is an iterator that returns the triples in order.

Cursor tv = ts.select  
              ("(?t ?u) (and (q ?x " +  
			      "  !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +  
			      "	?y ? ?t) " +  
			   "  (q ?y " +  
                 "   !http://www.w3.org/2000/01/rdf-schema#subClassOf " +  
			     "	?z ? ?u))",  
                new Object[0], ""); 

The cursor in variable tv will return triples t1, u1, t2, u2,... where t1 is the triple matching ?t in the first match of the query, and u1 is the triple matching ?u in the first match of the query.

If query variables not bound to triples are included in the query variables, they are ignored. Thus the query

Cursor tw = ts.select  
              ("(?t ?x ?u) (and (q ?x " +  
		                 "  !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +  
				 "  ?y ? ?t) " +  
				" (q ?y " +  
                   " !http://www.w3.org/2000/01/rdf-schema#subClassOf " +  
				  "  ?z ? ?u))",  
                new Object[0], ""); 

returns exactly the same value as the previous query. Additional select() methods are provided to allow data to be substituted into the query.

More complex queries using SPARQL

AllegroGraph includes a SPARQL implementation that may be used to search a database. The methods twinqlAsk(), twinqlSelect, twinqlFind, and twinqlQuery allow searches that return a true/false result, an array of objects, a Cursor instance or a result serialized into an XML string.

For notes on twinql's conformance to the W3C specification please see this document.

How to use text indexing from Java

If you want to know how this all works it is worthwhile to look at the tutorial after this section. The Javadocs also describe all the main methods.

The main methods:

public Cursor getFreetextStatements(String pattern) 

will return a cursor of all the triples that match pattern.

The input pattern for getFreetextStatements is described in the JavaDocs but here is a summary of the syntax for the input patterns.

_pattern_ -> _string-pattern_ | _composite-pattern_  
_string-pattern_ -> _string_ | _phrase-string_  
_string_ -> _char_"  
_char_ -> *?*    -- denotes a wild card that matches any single character  
_char_ -> *\**   -- denotes a wild card that matches any sequence of characters  
_char_ -> _any_  -- most other characters denote themselves  
_phrase-string_ -> `'this is a phrase'`   no wild cards allowed  
_composite-pattern_ -> (and _pattern_\*) | (or _pattern_\*)  
  
  
public ValueObject[] getFreetextUniqueSubjects(String pattern) 

will return a ValueObject that contains all the unique triple-subjects that match pattern.

public String[] getFreetextPredicates() 

returns a string array of the predicates that you registered for freetext indexing.

public void registerFreetextPredicate(Object predicate) 

register a predicate for indexing. Freetext indexing predicates must be registered before any triples are added to the triple store. We will relax this constraints in future versions.

Reference

The AllegroGraph server application

The AllegroGraph server is started with a call to the AllegroGraph Lisp function start-agj-server.

Starting Lisp and the AllegroGraph Server

We assume that you installed the Lisp on one of your machines according to the instructions that came with Allegro Common Lisp. See the Franz documentation for installation instructions for Allegro CL.

We also assume that you know how to startup Lisp. See the Franz documentation on starting Lisp for more information.

On Windows, select the menu item `Start | Programs | Allegro CL 8.1 | Modern ACL Images | Allegro CL 8.1 (Modern)` (you can also start the one with the IDE if you want to play with the interactive Lisp version of the AllegroGraph).

On Linux/Solaris/Mac OS X (or any other non-Windows platform), the recommended way to start Lisp on UNIX machines is as a subprocess of Emacs (XEmacs or GNU Emacs). However, Lisp may be started from a shell. The disadvantage of starting Lisp from a shell is that the editing and other features of the Emacs-Lisp interface are not available. The command for starting in a shell (assuming the Allegro directory is in your PATH) is:

mlisp 

When Lisp is started, an interactive session (similar to a Unix shell, or DOS shell) is opened. Lisp expressions are entered, evaluated and the results printed out. Some expressions may be evaluated for their side-effects. It is also possible to package a Lisp application so that it simply starts and does its thing without any interactions, but that is an advanced topic. In these examples we use the interactive mode for the flexibility it affords. AllegroGraph is an optional module that is loaded (enabled) by evaluating the following expression:

(require :agraph)        

You now can do the Lisp tutorial as described in agraph-tutorial.html or you can continue with the following:

The AllegroGraph server Lisp function

Start AllegroGraph Java Server.

To start the server, evaluate the following expression in the Lisp application:

(db.agraph:start-agj-server) 

or the more complex form

(db.agraph:start-agj-server  
  :port 1776 :root "e:/tmp" :limit 3 :ender 'my-end-function :nanny 5) 

The second form starts a server at port number 1776; the default directory will be "e:/tmp"; three connections will be allowed before the server shuts down; the function my-end-function will be called whenever a connection is terminated, and when the server shuts down; a separate process will check for dead connections every 5 seconds.

The arguments of the call specify how the server should be configured, The Java application must use these same parameters to connect to the server. All the arguments are described in detail in the AllegroGraph Reference Guide.

Setting the location of the AllegroGraph Server application

The main() method of the AllegroGraphConnection class is a utility that sets the Java Preferences value used by the subsequent application.

java -cp '.:com.franz.agraph-2-2-5.jar' com.franz.ag.AllegroGraphConnection [-user uuu] [system sss] 

If the method is run without any arguments, it simply lists the current settings on the console.

The -user argument sets a user preference; the -system argument sets a system preference.

Setting a system preference normally requires administrator permission.

The value of each argument is the absolute pathname of the AllegroGraphJavaServer executable distributed with AllegroGraph.

We have not tested Preferences settings with all possible Java and OS combinations. On Windows XP, both user and system preferences are set reliably with Java 1.4.2 and Java 5. On Linux (Fedora 5), Java 5 sets user preferences but GNU Java 1.4.2 did not.

AllegroGraph Java sources

The Java code for the AllegroGraph Java API is open source under the terms of the Mozilla Public License Version 1.1. The source code is distributed with AllegroGraph and is installed with the other AllegroGraph files. The main source files are in agsrc-2-2-5.jar. The file agsrctbc-2-2-5.jar contains additional classes that are used by TopBraidComposer.