Support/Doc | About | Purchase | Advanced Search

AllegroGraph RDFStore ^TM

AllegroGraph RDFStore is a modern, high-performance, persistent RDF graph database. AllegroGraph uses disk-based storage, enabling it to scale to billions of triples while maintaining superior performance. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning from Java applications.

AllegroGraph New V3.3 Features

Added `filtered-triple-stores` and `graph-filtered-triple-stores`
Added support for named triple-stores to `select` `q` functor family
Added support for N-Quads
Increased maximum size of string-tables to 2³⁹ bytes (512 Gigabytes)
SPARQL – Significant performance improvements throughout, particularly in left joins. Many other improvements.
Prolog Query Optimizer – Improved handling of several Prolog functors and query planner.
Social Network Analysis (SNA) – Improved efficiency, more robust, extended API.
JAVA Jena – Improved handling of graphs.
JAVA API – Improved condition hierarchy.
RDFS++ Reasoning – Dynamic Materialization Perform queries immediately after triples are loaded and indexed. Improved hasValue reasoning, transitive property inference and transitive inverse property inference.
AGWebView – Provides access to your data via an ordinary Web browser.
Gruff – A Grapher-Based Triple-Store Browser, now supports MAC.
Learning Center – The change log lets you know what’s new.
LUBM Benchmarks – Updated for this release.
TopBraid Composer – Integration with TBC release 3.2.

Performance Improvements – We are constantly working on the performance of AllegroGraph, and this release is again faster. For a complete listing of the improvements and new features, please see our change history

High-performance Storage

Allegrograph is designed for maximum loading speed and query speed. (See here for LUBM query results.) Loading of triples, through its highly optimized RDF/XML and N-Triples parsers, is best-of-breed, particularly with large files. Using standard x86 64-bit hardware, it can load gigabytes of RDF data in minutes. The following table displays AllegroGraph's performance in loading and indexing a variety of commonly used benchmark RDF files and Ontologies.

Load Test	# Triples	Size*	Time, Single CPU	Load Rate **	Time, Federated	Load Rate**
LUBM(50)	6.88 M	1.6 GB	7.83 Min.	14.64	2.46 Min.	41.4
Wikipedia	47.1 M	11 GB	42.7 Min.	18.38	15.0 Min.	52.3
Uniprot	234 M	28.6 GB	3.35 Hr.	19.4	1.18 Hr.	55.3
LUBM(8000)	1,106 M	155 GB	22.47 Hr.	13.67	7.8 Hr.	39.4

* Size = size of triple file, **Load Rate = KTPS = Thousand Triples per second

Hardware - Linux based, dual CPUs (x86-64) 1.8 GHz, with 16 GB RAM. Hardware for Federated is a quad CPU machine with the same specs as dual CPU machine. Hardware for the LUBM(8000) is a quad core Xeon, 2.33 GHz, with 64 GB RAM.

On Amazon's EC2, AllegroGraph loaded and indexed 10 Billion Triples derived from 1 Billion Telecom CDRs (Call Detail Records) into 10 large EC2 Instances, 4 parallel loads per Instance, in 6.19 hours.

Storage is persistent, including between application launches in on-disk binary trees. There is no additional serialization or deserialization overhead.
Querying is both flexible and performant. With multiple indices, it supports very fast access through a simple triple-level API, Allegro Prolog, or SPARQL (the W3C standard RDF query language). When querying for a particular subject with ten triples, AllegroGraph can retrieve about 40,000 triples per second, from disk.

AllegroGraph RDFStore Architecture

There are many ways to work with AllegroGraph:

Java. There are several ways to work with AllegroGraph from Java: we supply Java classes that allow direct access to all the features of AllegroGraph; we also supply adapters that allow access through Jena library calls, and through Sesame 2.x library calls. Another way is through the HTTP facilities in the next paragraph.
Sesame HTTP client protocol. It is possible for developers to interact with AllegroGraph using the Sesame 2.1 HTTP protocol to add and delete triples, to query for individual triples and to do SPARQL and Prolog selects. We extended the protocol so that it can do additional database management functions.
Python, Ruby, JavaScript, etc. The HTTP interface can be used from any language that knows how to make HTTP client requests. This way, you can easily use AllegroGraph from Ruby, Python and many other languages.
Lisp. The AllegroGraph client API is another view of the triple store for a Lisp application. The server runs in another process and may be located on a separate host located far from the client.
AllegroGraph Lisp Edition - Standalone. In addition to using Lisp as a client to the AllegroGraph Server, as noted above, you have the full power of a Lisp-REPL Development Environment.
TopBraid Composer. This is a commercially supported tool for modeling and editing ontologies. You can connect TopBraid composer to AllegroGraph and visually inspect your RDF/OWL data. For details see TopBraid Composer

Powerful and Expressive Reasoning and Querying

AllegroGraph provides the broadest array of mechanisms to query and access knowledge in an RDF datastore:

RDFS++ Reasoning

Description logics or OWL-DL reasoners are good at handling complex ontologies. They tend to be complete (give all the possible answers to a query) but can be totally unpredictable with respect to execution time when the number of individuals increases beyond millions. AllegroGraph offers a less complete but very fast and practical RDFS++ reasoner. We support all the RDF and RDFS predicates and some OWL ones. The supported predicates are rdf:type, rdfs:subClassOf, rdfs:range, rdfs:domain, rdfs:subpropertyof, owl:sameAs, owl:inverseOf, owl:TransitiveProperty, and owl:hasValue.

AllegroGraph's RDFS++ engine dynamically maintains the ontological entailments required for reasoning: it has no explicit materialization phase. Materialization is the pre-computation and storage of inferred triples so that future queries run more efficiently. The central problem with materialization is its maintenance: changes to the triple-store's ontology or facts usually change the set of inferred triples. In static materialization, any change in the store requires complete re-processing before new queries can run. AllegroGraph's Dynamic Materialization simplifies store maintenance and reduces the time required between data changes and querying.

SPARQL Queries on Named Graphs

SPARQL, the W3C standard RDF query language, returns RDF, XML and other formats in responses to queries. AllegroGraph's SPARQL, one of the W3C's "interoperable implementations", includes a query optimizer, and has full support for named graphs. It can be used wtih the RDFS++ reasoning turned on (i.e., query over real and inferred triples). SPARQL can be used with every available AllegroGraph interface mentioned in the previous section.

Prolog
Allegrograph's RDF Prolog provides concise, powerful, industry-standard, domain-specific reasoning to build high-level concepts (that require complex rules or numerical processing) on top of RDF data. Such is difficult (or very cumbersome) to model with only RDF/RDFS and OWL. Prolog can also be used on top of the RDFS++ reasoner as a rule based system.

RacerPro and RacerPorter
The Semantic Web reasoning system, RacerPro, has been integrated with AllegroGraph, exposing RDF data in AllegroGraph to Racer's highly optimized Description Logic (DL) reasoner. It is most suitable for ontology-driven applications or theorem proofing. RacerPro's interfaces also include DIG over HTTP and support for rules (SWRL).

Low-level APIs Allow fast, 'close-to-the-metal' access to triples by subject, predicate, and object.

Additional Features

Other essential Triple-Store features:

Geospatial and Temporal Reasoning
AllegroGraph stores geospatial and temporal data types as native data structures. Combined with its indexing and range query mechanisms, AllegroGraph lets you do geospatial and temporal reasoning efficiently.
Social Social Networking Analysis
An SNA library that has functions for treating a triple-store as a graph of relations, with functions for measuring importance and centrality as well as several families of search functions. Examples algorithms are in-degree, out-degree, nodal-degree, density, actor-degree-centrality, group-degree-centrality, actor-closeness-centrality, group-closeness-centrality, actor betweenness-centrality, group-betweenness-centrality, connected-p, and find-clique-around. Geospatial and temporal primitives combined with SNA functions form an activity recognition framework for flexibly analyzing networks and events in large volumes of structured and unstructured data.
Native Data Types and Efficient Range Queries
AllegroGraph stores a wide range of data types directly in its low level triple representation. This allows for very efficient range queries and significant reduction in triple-store data size. With other triple-stores that only store strings, the only way to do a range query is to go through all the values for a particular predicate. This works well if everything fits in memory; but if the predicate has millions of triples, it will need costly machines with huge amounts of RAM. AllegroGraph supports most XML Schema types (native numeric types, dates, times, longitudes, latitudes, durations and telephone numbers).
Free-text Indexing
AllegroGraph supports free-text indexing on the objects of triples whose predicates have been registered for indexing. Once indexed, triples can be found using a simple but robust query language. Free-text indexing support includes functions to register predicates and see which predicates are registered.
Named Graphs for Weights, Trust Factors, Provenance
AllegroGraph actually stores quints. A triple in AllegroGraph contains 5 slots, the first three being subject (s), predicate (p), and object (o). The remaining two are a named-graph slot (g) and a unique id assigned by AllegroGraph. The id slot is used for internal administrative purposes, but can also be referred to by other triples directly.

The W3C proposal is to use the 'named-graph' slot for clustering triples. So for example, you load a file with triples into AllegroGraph and you use the filename as the named-graph. This way, if there are changes to the triple file, you just update those triples in the named graph that came from the original file. However, with AllegroGraph, you can also put other attributes such as weights, trust factors, times, latitudes, longitudes, etc, into the named graph slot.
Direct Reification
AllegroGraph allows triple-ids to be the subject or object of another triple. This is beyond the scope of pure RDF. The advantage of this approach is that you can reduce the total number of triples in the store to a more manageable size, and, even more importantly, dramatically reduce query time because a single query can retrieve more data.
Clustering
When loading a large set of data on a single processor system, roughly 60% of the time is spent in loading triples, 40% is spent in indexing. If you 'bulk load' your data on a multiple processor system or a cluster of independent machines, you can do nearly all indexing parallel to the loading process. And, while running interactively, it can index newly added triples in the background.
Federation
AllegroGraph supports queries with distributed databases. You can group multiple triple-stores, both local and remote into a single virtual store. It allows thread-safe opening of multiple triple-databases from one application (for the read only parts of the database). Queries over multiple databases are easy with direct data access from applications. It also supports physical merging of databases.

Professional Services

Make the most of your use of semantic technologies by utilizing our consulting services.

We provide:

Help migrate data from RDBMS or CSV files into AllegroGraph
Pilot and Evaluations - with Semantic Technologies in general
Migration Assessment - moving to ontology-based systems
Milestone Review - we can help verify and reality-check your project
Performance Analysis - getting the most out of AllegroGraph
Deployment Options - advise on deployment options
Application-specific coding

Contact a Franz Product Applications Manager for information about getting started today, at 1-888-256-7669, ext. 300; outside of Canada and the US call +1-510-452-2000, ext. 300 or email: [email protected]

AllegroGraph RDFStore Documentation

System Requirements

Though designed for 64-bit architectures, AllegroGraph runs on the 32-bit and 64-bit operating systems listed below. The 32-bit platforms, depending on the database, may reach architectural limits in as few as ten million triples. Though fully featured and compatible with 64-bit databases, the 32-bit versions are appropriate for moderate size databases and for exploration of the technology. Please read the notes on performance tuning here.

32-Bit	64-Bit
Apple Mac OSX (x86) 10.4	Apple Mac OSX (x86-64) 10.5
Linux (x86), glibc 2.3	Linux (x86-64), glibc 2.4
Microsoft Windows 2000/XP/Vista/7/Server 2003	Microsoft Windows XP/Vista/7/Server 2003 (x86-64)
FreeBSD 6.x (x86)	Sun Solaris (x86-64) 2.10
	Amazon EC2 (Linux and Solaris x86-64)

AllegroGraph RDFStore TM

AllegroGraph RDFStore ^TM