Introduction

AllegroGraph can store very large amounts of data and can query this data in an efficient manner, but to do these things, AllegroGraph needs access to large amounts of storage space (disk space and RAM) including shared memory in /dev/shm. AllegroGraph's efficiency comes from the ability to store temporary data which can be quickly retrieved. If AllegroGraph is unable to do this, queries or even the server may fail.

Here are three areas where AllegroGraph needs sufficient space:

  1. Storage space (disk or RAM or both)

  2. Shared memory segment (/dev/shm) space

  3. Temporary file space

We deal with each in turn. Then we discuss paging issues and RAM requirements.

Storage space

As described in Triple Indices the storage space needed per triple is approximately 100 bytes.

Repositories have string tables. The string table size is specified by the StringTableSize directive in a catalog specification (see the Catalog directives section of the Server Configuration and Control document, and also see the discussion of StringTable-Size in the Performance Tuning document). The size specifies the number of slots. The default is 16 M and the size can be as much at 512 M (M = 2^20). Each slot uses 4 bytes, so the string table requires (4 x StringTableSize) bytes.

Other stuff: (1) each repository uses about 400 Mbytes of disk space in addition to space for triples and the string table. (2) Transaction log files will be created when a repository is created and others may be created later (see the associated configuration directives such as TransactionLogSize and DesiredTlogFiles in the Catalog directives section of the Server Configuration and Control document). These files can be quite large and are not deleted if the TransactionLogArchive configuration directive has a value. If that directive has a value (specific to each catalog), all transaction logs are saved and the space used by such logs can grow without bound. See the Transaction Log Archiving document.

If AllegroGraph needs to write data to disk, because, for example, a file of new triples has been loaded, and disk space is not made available for that purpose, AllegroGraph may hang or fail, though even if it happens that AllegroGraph cannot write to the disk, no data not in the file is lost and the state of the repository is at the point of the last committed transaction..

Both New WebView and Traditional WebView can display graphs showing disk space used by a repository. See the Report pages section in traditional WebView. In New WebView click on Storage Overview in the Repository menu to the left. Note that most graphs show the total space used by AllegroGraph, not the available storage space.

Shared memory segment (/dev/shm) space

When a repository is opened, its string table is copied to a shared memory segment (/dev/shm). Additionally about 400 Mbytes (more if the configuration directive ExpectedStoreSize is large) is reserved as work space (not the same 400 MB mentioned in the storage space section above). As noted above, the string table uses 4 times the declared string table size bytes. So suppose we have this specification (in the agraph.cfg file) for the root catalog:

<RootCatalog>   
  Main /tmp/ag4/root   
  StringTableSize 128M  
</RootCatalog> 

That says that each repository in the root catalog has a string table needing 512 Mbytes (4x128M). When such a repository is opened, its string table (512 Mbytes or 0.5 Gbytes) is copied to shared memory.

Each open repository needs its shared memory space. If two repos in the root catalog are open,

512 Mbytes + 400 Mbytes + 512 Mbytes + 400 Mbytes = 1824 Mbytes 

is required, a bit under 2 Gbytes. Additionally, queries generate temporary data structures and a reasonably complex query can require several hundred Mbytes. That should be added to repo requirements, so opening one repo needs perhaps 2.5 GB (2 for the repo and .5 for queries), opening two needs 4.5 (2 more for the second repo), and so on. Opening a federated repository is equivalent to opening its constituent repos individually, so a federation of two repos needs 4.5 GB. (These are all generous estimates for reasonable normal operation, not minimums that might work.)

On modern hardware, this requirement is usually not a problem. The shared memory segment is typically limited to half of RAM, and if RAM is 96 GBytes, the 2.5 or 4.5 or even 10 Gbytes is easily accomodated. But if there is only a small quantity of RAM, say 4 or 8 Gbytes, AllegroGraph may just be able to open one repository or two at the most.

The largest allowed value for string table size is 512M, requiring 2 Gbytes of space. Two such repos would need about 5 Gbytes of shared memory space (2 x (2GB + 400MB)) plus an additional .5 GB for queries.

Temporary file space

AllegroGraph occasionally writes temporary files. These can be quite large, but not exceptionally so, and they are temporary: they are pretty quickly deleted. This is rarely a problem so long as the temporary directory chosen has a reasonable amount of free space.

The temporary directory is specified by the TempDir configuration directive (see the Top-level directives section in the Server Configuration and Control document). Its default value is the designated temporary directory (typically /tmp) on the machine running the AllegroGraph server.

It does happen that users have placed their data files on some large storage area (perhaps in the cloud) and have plenty of shared memory space, but do not notice that the server machine's /tmp folder is small, and have a problem as a result. The fix is easy: specify a disk with plenty of space as the value of the TempDir configuration option.

Swap space and paging

If there is sufficient RAM, so that repository data, string tables, miscellaneous needs, and query temporary data all fit in RAM, then AllegroGraph will perform in a maximally efficient manner. The general rule is, the more RAM the better!

However, the system will work with less RAM, paging data to and from disk as necessary. But paging does slow down query performance, perhaps significantly.

It is possible to map the shared memory segment (/dev/shm) so parts reside on disk (that is, increase the shared memory segment so it exceeds the available RAM), but this is strongly discouraged. Doing so will very significantly degrade query performance for any serious use of the database.