AllegroGraph RDFStore is a modern, high-performance, persistent RDF graph database. AllegroGraph uses disk-based storage, enabling it to scale to billions of triples while maintaining superior performance. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning from Java applications.
Allegrograph is designed for maximum loading speed and query speed. (See here for LUBM query results.) Loading of triples, through its highly optimized RDF/XML and N-Triples parsers, is best-of-breed, particularly with large files. Using standard x86 64-bit hardware, it can load gigabytes of RDF data in minutes. The following table displays AllegroGraph's performance in loading and indexing a variety of commonly used benchmark RDF files and Ontologies.
Load Test |
# Triples |
Size* |
Time, Single CPU |
Load Rate ** |
Time, Federated |
Load Rate** |
---|---|---|---|---|---|---|
6.88 M |
1.6 GB |
7.83 Min. |
14.64 |
2.46 Min. |
41.4 |
|
47.1 M |
11 GB |
42.7 Min. |
18.38 |
15.0 Min. |
52.3 |
|
234 M |
28.6 GB |
3.35 Hr. |
19.4 |
1.18 Hr. |
55.3 |
|
1,106 M |
155 GB |
22.47 Hr. |
13.67 |
7.8 Hr. |
39.4 |
* Size = size of triple file, **Load Rate = KTPS = Thousand Triples per second
Hardware - Linux based, dual CPUs (x86-64) 1.8 GHz, with 16 GB RAM. Hardware for Federated is a quad CPU machine with the same specs as dual CPU machine. Hardware for the LUBM(8000) is a quad core Xeon, 2.33 GHz, with 64 GB RAM.
On Amazon's EC2, AllegroGraph loaded and indexed 10 Billion Triples derived from 1 Billion Telecom CDRs (Call Detail Records) into 10 large EC2 Instances, 4 parallel loads per Instance, in 6.19 hours.
There are many ways to work with AllegroGraph:
AllegroGraph provides the broadest array of mechanisms to query and access knowledge in an RDF datastore:
Description logics or OWL-DL reasoners are good at handling complex ontologies. They tend to be complete (give all the possible answers to a query) but can be totally unpredictable with respect to execution time when the number of individuals increases beyond millions. AllegroGraph offers a less complete but very fast and practical RDFS++ reasoner. We support all the RDF and RDFS predicates and some OWL ones. The supported predicates are rdf:type, rdfs:subClassOf, rdfs:range, rdfs:domain, rdfs:subpropertyof, owl:sameAs, owl:inverseOf, owl:TransitiveProperty, and owl:hasValue.
AllegroGraph's RDFS++ engine dynamically maintains the ontological entailments required for reasoning: it has no explicit materialization phase. Materialization is the pre-computation and storage of inferred triples so that future queries run more efficiently. The central problem with materialization is its maintenance: changes to the triple-store's ontology or facts usually change the set of inferred triples. In static materialization, any change in the store requires complete re-processing before new queries can run. AllegroGraph's Dynamic Materialization simplifies store maintenance and reduces the time required between data changes and querying.
SPARQL, the W3C standard RDF query language, returns RDF, XML and other formats in responses to queries. AllegroGraph's SPARQL, one of the W3C's "interoperable implementations", includes a query optimizer, and has full support for named graphs. It can be used wtih the RDFS++ reasoning turned on (i.e., query over real and inferred triples). SPARQL can be used with every available AllegroGraph interface mentioned in the previous section.
Allegrograph's RDF Prolog provides concise, powerful, industry-standard, domain-specific reasoning to build high-level concepts (that require complex rules or numerical processing) on top of RDF data. Such is difficult (or very cumbersome) to model with only RDF/RDFS and OWL. Prolog can also be used on top of the RDFS++ reasoner as a rule based system.
The Semantic Web reasoning system, RacerPro, has been integrated with AllegroGraph, exposing RDF data in AllegroGraph to Racer's highly optimized Description Logic (DL) reasoner. It is most suitable for ontology-driven applications or theorem proofing. RacerPro's interfaces also include DIG over HTTP and support for rules (SWRL).
Other essential Triple-Store features:
AllegroGraph stores geospatial and temporal data types as native data structures. Combined with its indexing and range query mechanisms, AllegroGraph lets you do geospatial and temporal reasoning efficiently.
An SNA library that has functions for treating a triple-store as a graph of relations, with functions for measuring importance and centrality as well as several families of search functions. Examples algorithms are in-degree, out-degree, nodal-degree, density, actor-degree-centrality, group-degree-centrality, actor-closeness-centrality, group-closeness-centrality, actor betweenness-centrality, group-betweenness-centrality, connected-p, and find-clique-around. Geospatial and temporal primitives combined with SNA functions form an activity recognition framework for flexibly analyzing networks and events in large volumes of structured and unstructured data.
AllegroGraph stores a wide range of data types directly in its low level triple representation. This allows for very efficient range queries and significant reduction in triple-store data size. With other triple-stores that only store strings, the only way to do a range query is to go through all the values for a particular predicate. This works well if everything fits in memory; but if the predicate has millions of triples, it will need costly machines with huge amounts of RAM. AllegroGraph supports most XML Schema types (native numeric types, dates, times, longitudes, latitudes, durations and telephone numbers).
AllegroGraph supports free-text indexing on the objects of triples whose predicates have been registered for indexing. Once indexed, triples can be found using a simple but robust query language. Free-text indexing support includes functions to register predicates and see which predicates are registered.
AllegroGraph actually stores quints. A triple in AllegroGraph contains 5 slots, the first three being subject (s), predicate (p), and object (o). The remaining two are a named-graph slot (g) and a unique id assigned by AllegroGraph. The id slot is used for internal administrative purposes, but can also be referred to by other triples directly.
The W3C proposal is to use the 'named-graph' slot for clustering triples. So for example, you load a file with triples into AllegroGraph and you use the filename as the named-graph. This way, if there are changes to the triple file, you just update those triples in the named graph that came from the original file. However, with AllegroGraph, you can also put other attributes such as weights, trust factors, times, latitudes, longitudes, etc, into the named graph slot.
AllegroGraph allows triple-ids to be the subject or object of another triple. This is beyond the scope of pure RDF. The advantage of this approach is that you can reduce the total number of triples in the store to a more manageable size, and, even more importantly, dramatically reduce query time because a single query can retrieve more data.
When loading a large set of data on a single processor system, roughly 60% of the time is spent in loading triples, 40% is spent in indexing. If you 'bulk load' your data on a multiple processor system or a cluster of independent machines, you can do nearly all indexing parallel to the loading process. And, while running interactively, it can index newly added triples in the background.
AllegroGraph supports queries with distributed databases. You can group multiple triple-stores, both local and remote into a single virtual store. It allows thread-safe opening of multiple triple-databases from one application (for the read only parts of the database). Queries over multiple databases are easy with direct data access from applications. It also supports physical merging of databases.
Make the most of your use of semantic technologies by utilizing our consulting services.
We provide:
Contact a Franz Product Applications Manager for information about getting started today, at 1-888-256-7669, ext. 300; outside of Canada and the US call +1-510-452-2000, ext. 300 or email: [email protected]
Though designed for 64-bit architectures, AllegroGraph runs on the 32-bit and 64-bit operating systems listed below. The 32-bit platforms, depending on the database, may reach architectural limits in as few as ten million triples. Though fully featured and compatible with 64-bit databases, the 32-bit versions are appropriate for moderate size databases and for exploration of the technology. Please read the notes on performance tuning here.
32-Bit | 64-Bit |
---|---|
Apple Mac OSX (x86) 10.4 | Apple Mac OSX (x86-64) 10.5 |
Linux (x86), glibc 2.3 | Linux (x86-64), glibc 2.4 |
Microsoft Windows 2000/XP/Vista/7/Server 2003 | Microsoft Windows XP/Vista/7/Server 2003 (x86-64) |
FreeBSD 6.x (x86) | Sun Solaris (x86-64) 2.10 |
Amazon EC2 (Linux and Solaris x86-64) | |
Copyright © Franz Inc., All Rights Reserved | Privacy Statement |