AllegroGraph RDFStore Performance Tuning

AllegroGraph RDFStore is a Persistent Database for RDF Triples. While it runs on 32-bit hardware, it was designed especially for 64-bit systems to support huge databases. We have succesfully loaded databases containing several billions of triples using an AMD OpteronTM 844 running at 1.8 GHz, with 128 KB of L1 cache, 1MB of L2 cache, 16 GB of ECC RAM, and running Linux 2.6.18.

Please read the following guidelines for configuring your system.

For 32-bit Systems

AllegroGraph is limited by the amount of memory available on most 32-bit systems, and especially on Windows. So don't expect to store much beyond a few million triples on a 1 GB machine.

We do easily load the 7 million triples LUBM(50) benchmark on a 32-bit machine. On a typical 1.8 GHz CPU, 1 GB laptop running Windows XP, we get the following numbers performance with the AllegroGraph server.

With the improved memory utilization on 32-bit systems in AllegroGraph, if a triple store has many long resource and literal strings, the effective size limit will still be just a few million triples, but a store needing only a modest number of unique strings and primarily using encoded UPIs can now support in excess of 100 million triples. Speed on 32-bit platforms will still be substantially less than 64-bit platforms.

Table 1, LUBM(50), 7 Million Triples

System: 32-bit, XP, 1 Gig Ram
AllegroGraph Results
Load time
8 minutes
Memory consumption
340 MB
Disk Space
510 MB
Query times

The parameter setting for DefaultExpectedResource is 3,000,000 and the ChunkSize is set to 2,000,000.

Please note that the Java client talking to the server also will consume some memory.

Performance Tuning Tips For 64-bit Systems

There are two important variables that you can use to tune memory use.

[1] ags.setDefaultExpectedResources();

The main factor that determines memory usage is the number of unique resources and literals present and their total size in the RDF input file(s). One way to optimize the number of triples in your system is to set the number of expected resources when opening a triple store. This will immediately allocate the right amount of memory and your image size won't grow much beyond that. If you set the initial value too small, the string table will have to be rebuilt, possibly many times, and more memory will be used.

[2] ags.setChunkSize()

Before the triples are committed to disk they are indexed in memory. This variable determines how many triples will be indexed at a time. The current default should be appropriate for a 1 GB Windows box but might be too large for a 512 MB Windows box.

For 64-bit Systems

Even 64-bit systems are limited by the amount of available physical memory and the same tuning parameters apply. The default ChunkSize is set much higher, so if you have only 4 Gig you might consider reducing the default setting for this variable somewhat.

A typical run of the 44 Million Triples Wikipedia RDF file on a 16 GB AMD Opteron running Linux yields the following results:

Table 2, Wikipedia RDF File, 44 Million Triples

System: 64-bit, Linux, 16 Gigs Ram
AllegroGraph Results
Load time
51 minutes
Memory consumption (RSS)*
3.6 GB
Disk Space (just data)
2.8 GB
Disk Space (with indices)
10.0 GB
* RSS is the Resident Set Size

The parameter setting for DefaultExpectedResource is 10,000,000 and the ChunkSize is set to 10,000,000.

Copyright © 2014 Franz Inc., All Rights Reserved | Privacy Statement
Delicious Google Buzz Twitter Google+