AllegroGraph 4.0 Performance Tuning
For users of large databases (for example 1 billion triples), we recommend using hardware with many cores (8 or more), plentiful RAM (64 GB), and fast hard drives (15K RPM, arranged in RAID arrays employing striping).
Regardless of the hardware you have available, this guide provides some tips on how to use some of AllegroGraph's tunable parameters to maximize database performance on that hardware. These settings are applied in the agraph.cfg file. Documentation of all agraph.cfg settings appears on the Server Configuration page.
Adjust the Backends Setting
The Backends setting in agraph.cfg determines the maximum number of (non-dedicated) back-ends that the server may spawn. On a machine with many CPU cores, heavy concurrent access can be faster when more back-ends are allowed to be spawned.
As a rule of thumb, set the Backends setting to the number of CPU cores on your machine.
Use Dedicated Sessions
When using non-dedicated back ends, all communications between the client and back ends are funneled through the AllegroGraph service daemon, adding overhead. To maximize client/server communication speed, clients should use dedicated sessions instead.
However, be aware that each dedicated back end uses resources on the server. Dedicated back ends increase performance up to an optimum, after which resource issues begin to degrade performance again.
Catalog Parameters that Improve Performance
You can design catalogs in agraph.cfg to improve disk-access performance. Such a catalog description might resemble this one:
<Catalog fast>
ExpectedStoreSize 2000000
CheckpointInterval 1h
TransactinLogDir /mnt/disk3/ag4/fast
Main /var/lib/ag4/fast
Aux /mnt/disk2/ag4/fast
</Catalog>
Aux Directories
Load performance (and to a lesser degree, some types of concurrent query performance) can be improved by specifying one or more Aux directories. If you have filesystems on multiple hard drives (or multiple RAID arrays appearing as virtual drives), you can use Aux parameters to specify directories on those filesystems. AllegroGraph will spread database index files amongst the specified directories (in addition to the Main directory).
That said, it is our experience that using a single RAID array which employs some form of striping (RAID 0, RAID 5, RAID 6, RAID 10, etc.,) offers simpler management, better storage efficiency, and often better performance than using the same drives as individual Aux filesystems.
Be aware that each of the Main and Aux filesystems must be large enough to hold twice the amount of data required to represent the database.
TransactionLogDir
Load performance can be improved by using a TransactionLogDir which specifies a directory on a filesystem which is physically separate from the filesystems on which Main or Aux directories are located.
ExpectedStoreSize
Load performance can be improved by using the ExpectedStoreSize catalog parameter. The value should be the maximum number of triples you expect to add to the triple store. During normal operation, AllegroGraph may resize data structures as triples are added to the triple store. Using the ExpectedStoreSize settings allows AllegroGraph to pre-size certain data structures, reducing the number of resizes required, thereby improving overall load performance.
The expected store size can be reset whenever a database is created or opened by clients. The new value will override the value specified for the catalog.
CheckpointInterval
Load performance can be improved by setting the CheckpointInterval catalog parameter. By default, checkpoints occur every 5 minutes. While a checkpoint is operating, commits are blocked. On a large database, checkpoints can take several tens of seconds to complete. Setting the CheckpointInterval to a longer interval reduces how often checkpoints occur, thereby reducing the impact on commits.
Note that increasing the CheckpointInterval may increase the amount of time it takes to recover after an unclean database shutdown.
Note that certain background database operations trigger checkpoints regardless of the CheckpointInterval setting. So, in effect, the CheckpointInterval settings sets an upper bound on the amount of time that may elapsed before a checkpoint occurs.