Introduction
This document introduces AllegroGraph. It assumes that you are somewhat familiar with RDF (Resource Description Framework), RDFS (RDF Schema), and OWL (Web Ontology Language). If you are not very familiar with RDF, RDFS, and OWL, we suggest that you start with A Semantic Web Primer by Grigoris Antoniou and Frank van Harmelen (2001, Cambridge MA, MIT press; available, e.g. from www.amazon.com). It is a very gentle introduction to these new technologies. For a quick introduction, see these Wikipedia entries: OWL, RDF, and RDFS.
The big picture
AllegroGraph is a pure triple store that you can use for storing RDFS/OWL triples but also as an on-disk graph database.
Triples: For conventional reasons we call AllegroGraph a triple-store but actually it stores quints. A triple is a structure with 5 slots: The first three are the usual subject (s), predicate (p), and object (o); In addition a triple has a named-graph slot (g) and a unique, AllegroGraph assigned, id (i). If you are not familiar with named-graphs or their usage then please see http://www.w3.org/2004/03/trix/ for more information. You may also want to look at the paper "Named Graphs, Provenance and Trust" by Carroll et. al. at http://www2005.org/cdrom/docs/p613.pdf (PDF) where they were introduced.
Loading: There are several ways to load data into the triple store. Currently we support NTriple format, RDF/XML format and you can programmatically insert triples.
Dictionary: Resources, blank-nodes and literals are stored in a dictionary and accessed by a hash we call the Unique Part Identifier (UPI).
Indices: AllegroGraph is indexed in such a way that any combination of s, p, o, and g can always be found with one disk access. We provide a cursor on the index to optimize memory usage.
First-class Triples for reification : AllegroGraph has unique ids and we allow triples to point to other triples. This makes reification (making statements about a triple) very efficient, i.e. less space and time is consumed than with the original RDF model of reification (see the RDF Semantics document for all the details).
and More AllegroGraph includes an RDFS++ reasoner, freetext indexing, full SPARQL support, Prolog integration and more!
Accessing AllegroGraph from Java
The Java API to the AllegroGraph triple store allows Java applications to access and manipulate triple store databases.
This tutorial introduces some of the Java Allegrograph API objects and methods in simple examples. The full documentation of the Java API is here.
Preparing the Triple Store
The Java API to the AllegroGraph Triple Store is a client-server implementation where the Java application is the client. In the Java-only edition of AllegroGraph, there are two distinct modes of operation possible:
- The Java application starts an AllegroGraph server when it needs one and discards it when done.
- The AllegroGraph server is started as a separate application, and the Java application connects when necessary.
Starting the AllegroGraph server from a Java application
In this mode of operation the Java application calls the startServer() method in the AllegroGraphConnection class. The only preparation needed for this mode of operation is to know where the AllegroGraph server executable was installed.
The Java application can specify the location of the server executable explicitly with a call to setDefaultCommand() or setCommand().
The Java application may also be started with a property setting for the property com.franz.ag.exec.
The most convenient mode is to set a user or system Java Preferences value with the utility in the main() method of the AllegroGraphConnection class. See the section Setting the location of the AllegroGraph Server application for full details. A Preferences setting persists from one session to the next and needs to be set only once in an installation.
Starting the AllegroGraph server as a separate application
The AllegroGraph server application is started from its installation location. The startup parameters specify the port numbers. The Java application must use these same parameters to connect to the server.
The section The AllegroGraph server application describes the AllegroGraph server application in detail.
Testing the interface
We include a sample program, AGExample.java, in the AllegroGraph distribution. This program may be used to verify the installation and to demonstrate that the connection between Java and AllegroGraph is working. Furthermore, the source code provides examples of using Java AllegroGraph. Please take a look at AGExample.java.
Before you run the client Java program, it must be informed about the location of one important file: com.franz.agraph-2-2-5.jar resides in the AllegroGraph installation directory.
The full pathname to this file can be included in the Java classpath, or the files may be copied to a more convenient location. When using Eclipse, it may be specified as a library in the project properties.
Testing the interface on Windows
The first step is to start the AllegroGraph server by selecting the AllegroGraph server item on the AllegroGraph Start Menu entry (or double-click on AllegroGraphJavaServer in the AllegroGraph installation directory).
The second step is to open a command window in the folder where AllegroGraph was installed.
At this point, the following command will start the sample application, but it will terminate immediately with an error message because the program needs the location of the database work area:
java -cp .;com.franz.agraph-2-2-5.jar AGExample
The full command line parameters of the sample program are described in a comment in the program source. The most important argument is "-d", a required argument which specifies an existing directory to hold the database files:
java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst
Other command examples:
Load the the Wilbur example OWL ontology:
java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -r wilburwine.rdf
The above command assumes you are in the AllegroGraph installation directory, as wilburwine.rdf is distributed with AllegroGraph.
Load the ntriples version of Wilbur OWL ontology:
java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -t wilburwine.ntriples
NOTE: when a large data file is specified, there may be a delay before the sample program shows any output.
Testing the interface on Linux and Unix
The first step is to open a shell in the AllegroGraph installation directory.
The second step is to start the AllegroGraphJavaServer executable. You may want to put it into the background and redirect the output from the program to a file.
At this point, the following command will start the sample application, but it will terminate immediately with an error message because the program needs the location of the database work area:
java -cp '.:com.franz.agraph-2-2-5.jar' AGExample
The full command line parameters of the sample program are described in a comment in the program source. The most important argument is "-d", a required argument which specifies :
java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst
Other command examples:
Load the the Wilbur example OWL ontology:
java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -r wilburwine.rdf
The above command assumes you are in the AllegroGraph installation directory, as wilburwine.rdf is distributed with AllegroGraph.
Load the ntriples version of Wilbur OWL ontology:
java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -t wilburwine.ntriples
NOTE: when a large data file is specified, there may be a delay before the sample program shows any output.
More advanced uses of the sample application
The sample application tests several other command-line arguments that modify the behavior of the application. These arguments are described in comments in the source code.
The application can also start the server if the "-x" argument is added to the command.
Stopping the AllegroGraph Server Application
Once the AllegroGraph server application is running, it can be terminated in several ways:
- a Java application may call one of the stopServer() methods,
- the server lease may expire, or
- an operating system kill signal from an operator console or window.
We supply a small Java application that stops the AllegroGraph server. The application is run with a command such as the following.
On Windows:
java -cp .;com.franz.agraph-2-2-5.jar AGStop [-p port] [-h host]
On Unix:
java -cp '.:com.franz.agraph-2-2-5.jar' AGStop [-p port] [-h host]
Tutorial
Connecting Java to the Triple Store
The first thing you might have noticed reading through the test program, AGExample.java, is that each Java application must connect to the server before any part of the API can be used. Connect to the server by creating a new instance of the class AllegroGraphConnection. The AllegroGraphConnection class implements methods open(), create(), and others that open databases and return instances of the class AllegroGraph. Each open database is represented by a new instance of the class AllegroGraph.
If the Java application disconnects from the server, all AllegroGraph instances become invalid and must be discarded.
Buffered Operations
The communication between the Java application and the AllegroGraph server takes place through a socket. In order to minimize the delays that may be imposed by operating system overheads, it is good practice to operate on many data items in each interaction between the Java client application and the server.
We facilitate this buffering by providing array operations for most of the database accessors. The array operations create or retrieve many database elements in a single interaction and therefore are much more time efficient.
Simple Database Operations
Opening a database
A database is opened by creating an AllegroGraph instance.
AllegroGraphConnection sv = new AllegroGraphConnection();
sv.enable();
AllegroGraph ts = sv.create("test, "/s/ja/temp/");
The database is closed with the closeDatabase() method. Once the database is closed, the AllegroGraph instance should be discarded since it cannot be used for further interactions.
To re-open a database, create a new AllegroGraph instance.
Creating triples
Triples can be created one at a time by naming the components with strings in ntriples syntax.
ts.addStatement("<http://www.franz.com/things#Dog>",
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",
"<http://www.w3.org/2002/07/owl#Class>");
The application can also save the details of the newly created triple by creating a new Triple instance with the newTriple() method.
Triple tr2 = ts.newTriple(
"<http://www.franz.com/things#Dog>",
"<http://www.w3.org/2000/01/rdf-schema#subClassOf>",
"<http://www.franz.com/things#Mammal>");
When many triples are created, it is more efficient to buffer the operation by grouping the triple components into arrays. The following statement creates three triples from corresponding elements of the arrays.
ts.addStatements(
new String[]{
"<http://www.franz.com/things#Cat>",
"<http://www.franz.com/things#Giraffe>",
"<http://www.franz.com/things#Lion>" },
new String[]{
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>" },
new String[]{
"<http://www.w3.org/2002/07/owl#Class>",
"<http://www.w3.org/2002/07/owl#Class>",
"<http://www.w3.org/2002/07/owl#Class>" }
);
When an array consists of identical elements, it can be shortened to a single element. The following statement creates three triples where the predicate and object components are identical.
ts.addStatements(
new String[]{
"<http://www.franz.com/things#Cat>",
"<http://www.franz.com/things#Giraffe>",
"<http://www.franz.com/things#Lion>" },
new String[]{"<http://www.w3.org/2000/01/rdf-schema#subClassOf>"},
new String[]{"<http://www.franz.com/things#Mammal>"}
);
Querying for triples
Triples are retrieved from the database with a Cursor instance. The Cursor instance can iterate through all the triples in the search result. The following statement will retrieve the four triples about subclasses of the "Mammal" class created earlier.
String wild = null;
Cursor cc = ts.getStatements(
wild,
"<http://www.w3.org/2000/01/rdf-schema#subClassOf>",
"<http://www.franz.com/things#Mammal>" );
When a Cursor instance is created, it is not positioned at a result. The step() method advances the Cursor instance to the first or next result. When a Cursor has been advanced, the returned value is true. When a Cursor is exhausted, the returned value is false.
if ( cc.step() ) Triple tr = cc.getTriple();
When the Cursor is positioned at a result, we can retrieve the component of interest without creating a Triple instance.
Value s = cc.getSubject();
We can also retrieve several results in one operation. The following statement retrieves an array of at most 6 elements:
Triple[] trc = cc.step(6);
int n = trc.length;
Optimization notes
Maximum Index Chunk Size parameter
This parameter, settable and gettable by the setChunkSize() and getChunkSize() methods, controls the maximum number of records that are sorted at a time during index merging. (Indexing happens by calling indexAll() or indexTriples() methods.)
The initial value of this parameter is believed to be good for machines with 1-2GB of RAM. If your computer has significantly more memory than this, you might improve indexing performance by using larger values (e.g., doubling or more the initial value).
Expected Unique Resources parameter
This parameter, settable and gettable by the setDefaultExpectedResources() and getDefaultExpectedResources() methods, controls the default value for the expected number of unique resources in a new triple store. This number is the expected number of distinct URIs and literals in the triple store database. If the number is too small, performance may suffer during database creation. A rough rule of thumb is to specify a number that is one third of the number of triples.
The OpenRDF Model
We implement most of the interfaces in the OpenRDF model defined at http://openrdf.org/.
The current implementation does not implement the interface Graph.
More complex queries using Prolog
AllegroGraph includes a Prolog implementation that may be used to search a database. The select() and selectValues() methods allow searches that return triples or database nodes and literals.
ValueObject[][] v =
ts.selectValues
("(?x ?y ?z) " +
" (and (q ?x " +
" !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +
" ?y) " +
" (q ?y " +
" !http://www.w3.org/2000/01/rdf-schema#subClassOf " +
" ?z))",
new Object[0], "");
The result v will be an array of sub-arrays. Each sub-array represents one successful match of the query. Each sub-array will be of length 3: the first element in the sub-array will be the binding of the variable ?x, the second ?y and the third ?z.
It may also be desirable to substitute values from the Java application into the query string. This can be done by simply concatenating the required strings, but we do allow a more convenient option.
URI typePred = ts.addURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type");
URI classPred = ts.addURI("http://www.w3.org/2000/01/rdf-schema#subClassOf>");
ValueObject[][] w = ts.selectValues
("(?x ?y ?z) (and (q ?x ?a ?y) (q ?y ?b ?z))"
new Object[]{ typePred, classPred },
"?a ?b");
This query returns the same result as the previous example, but we have substituted values from the program into the query.
A query can return a mixture of nodes, literals and triples. The query
URI typePred = ts.addURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type");
URI classPred = ts.addURI("http://www.w3.org/2000/01/rdf-schema#subClassOf>");
ValueObject[][] w = ts.selectValues
("(?x ?y ?z ?t ?u) (and (q ?x ?a ?y ? ?t) (q ?y ?b ?z ? ?u))"
new Object[]{ typePred, classPred },
"?a ?b");
returns an array where each sub-array is of length 5. The fourth and fifth elements in the sub-array are the triples that satisfied the query. The lone question marks in the pattern skip the graph position of each triple to allow unification with the triple ids.
If all the results of interest are triples, a select() method can be used to return a Cursor instance. The Cursor instance is an iterator that returns the triples in order.
Cursor tv = ts.select
("(?t ?u) (and (q ?x " +
" !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +
" ?y ? ?t) " +
" (q ?y " +
" !http://www.w3.org/2000/01/rdf-schema#subClassOf " +
" ?z ? ?u))",
new Object[0], "");
The cursor in variable tv will return triples t1, u1, t2, u2,... where t1 is the triple matching ?t in the first match of the query, and u1 is the triple matching ?u in the first match of the query.
If query variables not bound to triples are included in the query variables, they are ignored. Thus the query
Cursor tw = ts.select
("(?t ?x ?u) (and (q ?x " +
" !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +
" ?y ? ?t) " +
" (q ?y " +
" !http://www.w3.org/2000/01/rdf-schema#subClassOf " +
" ?z ? ?u))",
new Object[0], "");
returns exactly the same value as the previous query. Additional select() methods are provided to allow data to be substituted into the query.
More complex queries using SPARQL
AllegroGraph includes a SPARQL implementation that may be used to search a database. The methods twinqlAsk(), twinqlSelect, twinqlFind, and twinqlQuery allow searches that return a true/false result, an array of objects, a Cursor instance or a result serialized into an XML string.
For notes on twinql's conformance to the W3C specification please see this document.
How to use text indexing from Java
If you want to know how this all works it is worthwhile to look at the tutorial after this section. The Javadocs also describe all the main methods.
The main methods:
public Cursor getFreetextStatements(String pattern)
will return a cursor of all the triples that match pattern.
The input pattern for getFreetextStatements is described in the JavaDocs but here is a summary of the syntax for the input patterns.
_pattern_ -> _string-pattern_ | _composite-pattern_
_string-pattern_ -> _string_ | _phrase-string_
_string_ -> _char_"
_char_ -> *?* -- denotes a wild card that matches any single character
_char_ -> *\** -- denotes a wild card that matches any sequence of characters
_char_ -> _any_ -- most other characters denote themselves
_phrase-string_ -> `'this is a phrase'` no wild cards allowed
_composite-pattern_ -> (and _pattern_\*) | (or _pattern_\*)
public ValueObject[] getFreetextUniqueSubjects(String pattern)
will return a ValueObject that contains all the unique triple-subjects that match pattern.
public String[] getFreetextPredicates()
returns a string array of the predicates that you registered for freetext indexing.
public void registerFreetextPredicate(Object predicate)
register a predicate for indexing. Freetext indexing predicates must be registered before any triples are added to the triple store. We will relax this constraints in future versions.
Reference
The AllegroGraph server application
The AllegroGraph server is started with a call to the AllegroGraph Lisp function start-agj-server.
Starting Lisp and the AllegroGraph Server
We assume that you installed the Lisp on one of your machines according to the instructions that came with Allegro Common Lisp. See the Franz documentation for installation instructions for Allegro CL.
We also assume that you know how to startup Lisp. See the Franz documentation on starting Lisp for more information.
On Windows, select the menu item `Start | Programs | Allegro CL 8.1 | Modern ACL Images | Allegro CL 8.1 (Modern)` (you can also start the one with the IDE if you want to play with the interactive Lisp version of the AllegroGraph).
On Linux/Solaris/Mac OS X (or any other non-Windows platform), the recommended way to start Lisp on UNIX machines is as a subprocess of Emacs (XEmacs or GNU Emacs). However, Lisp may be started from a shell. The disadvantage of starting Lisp from a shell is that the editing and other features of the Emacs-Lisp interface are not available. The command for starting in a shell (assuming the Allegro directory is in your PATH) is:
mlisp
When Lisp is started, an interactive session (similar to a Unix shell, or DOS shell) is opened. Lisp expressions are entered, evaluated and the results printed out. Some expressions may be evaluated for their side-effects. It is also possible to package a Lisp application so that it simply starts and does its thing without any interactions, but that is an advanced topic. In these examples we use the interactive mode for the flexibility it affords. AllegroGraph is an optional module that is loaded (enabled) by evaluating the following expression:
(require :agraph)
You now can do the Lisp tutorial as described in agraph-tutorial.html or you can continue with the following:
The AllegroGraph server Lisp function
Start AllegroGraph Java Server.
To start the server, evaluate the following expression in the Lisp application:
(db.agraph:start-agj-server)
or the more complex form
(db.agraph:start-agj-server
:port 1776 :root "e:/tmp" :limit 3 :ender 'my-end-function :nanny 5)
The second form starts a server at port number 1776; the default directory will be "e:/tmp"; three connections will be allowed before the server shuts down; the function my-end-function will be called whenever a connection is terminated, and when the server shuts down; a separate process will check for dead connections every 5 seconds.
The arguments of the call specify how the server should be configured, The Java application must use these same parameters to connect to the server. All the arguments are described in detail in the AllegroGraph Reference Guide.
Setting the location of the AllegroGraph Server application
The main() method of the AllegroGraphConnection class is a utility that sets the Java Preferences value used by the subsequent application.
java -cp '.:com.franz.agraph-2-2-5.jar' com.franz.ag.AllegroGraphConnection [-user uuu] [system sss]
If the method is run without any arguments, it simply lists the current settings on the console.
The -user argument sets a user preference; the -system argument sets a system preference.
Setting a system preference normally requires administrator permission.
The value of each argument is the absolute pathname of the AllegroGraphJavaServer executable distributed with AllegroGraph.
We have not tested Preferences settings with all possible Java and OS combinations. On Windows XP, both user and system preferences are set reliably with Java 1.4.2 and Java 5. On Linux (Fedora 5), Java 5 sets user preferences but GNU Java 1.4.2 did not.
AllegroGraph Java sources
The Java code for the AllegroGraph Java API is open source under the terms of the Mozilla Public License Version 1.1. The source code is distributed with AllegroGraph and is installed with the other AllegroGraph files. The main source files are in agsrc-2-2-5.jar. The file agsrctbc-2-2-5.jar contains additional classes that are used by TopBraidComposer.