Backing up AllegroCache data

AllegroCache is a high-performance, dynamic object caching database system. With any database, there is a concern about object persistence: that is a concern about whether the data will be there when needed. In this note, we review the persistence issues and show how AllegroCache 1.1 handles them more effectively than earlier versions.

Databases differ in the degree by which they support object persistence. Here are in brief various persistence strategies, more or less arranged from least persistent to most:

  • In-memory databases: these are the least persistent. When the application ends either intentionally or unintentionally the data is gone.
  • Databases that permit the data in memory to be saved to disk and restored later. If the application ends unexpectedly, then only those objects created or modified since the last database save will be lost.
  • Simple disk-based databases. Data objects are placed in tables kept on disk. This permits more objects to be in the database than can stored in memory, but runs the risk of the application failing before it is through writing data to the disk. Such a failure can result in the disk tables being so corrupted that some or all data in the table is lost.
  • Disk-based logging databases. In this type of database changes are written to the end of a log file before modifications are made to the tables on disk. If the application should die while writing the tables on disk the data is still in the log file and the tables can be built from the log data. The only data ever at risk are the data which has not been written to the log file. The bulk of the data is protected.

AllegroCache version 1.0 was a database of the third type: objects were stored inside tables on the disk. AllegroCache 1.1 (the current version at the time of writing is 1.1.4) is a database of the fourth and most persistent type: all data objects are stored in log files which are written once and always at end. All subsequent references to log files are read-only. Tables exist on disk to help locate the data objects in the log as well to index the data objects based on values of their slots.

While the tables are critical for database operation they can all be recreated from the data on the disk. Therefore the only points of failure are the log files. The only susceptible part of the log files is the end of the current log file where new data is appended. Upon startup AllegroCache can verify the integrity of the files on disk in one of four ways:

  1. No tests at all. This would be the right thing to do if you know you have closed the database cleanly the last time it was used, but always doing at least a simple test (see 2) is preferred.
  2. A simple quick test to see if the last entries in the last log file are valid. This is so fast that this amount of testing is the least you should do.
  3. A scan of the whole last log file looking for errors. This can be time consuming for large log files but if you are not sure if the database was closed cleanly, you should do this.
  4. A scan of the last log file (just like method 3) and also a scan of all tables on disk. This can be very time consuming but it will find problems in the tables before they cause an application failure.

    Note that if a database is not closed cleanly, there is a slight chance of table corruption, so if there is any reason to believe the previous shutdown was not clean, using this method is recommended.

Even with this design you will lose your objects if the disk drive fails mechanically and you aren't running a RAID system and have another copy of the data on a another disk. You can minimize the loss by regularly backing up your AllegroCache database. The log file design in the current version makes this easier than it was in version 1.0.

To do a backup in version 1.0 you had to stop all clients committing to the database and then copy all the files that make up the database to your backup media. As time went on and the database grew this copy took longer and longer. With version 1.1, you need only copy the log files and you can do the copy while the database is running. Furthermore every logfile but the current log file is read-only. Thus once a log file reaches its maximum size and AllegroCache closes it and creates a new log file you can copy that old log just once to your back up media since you know it will never change again.

What we've just shown is how changes in AllegroCache 1.1 have improved the persistence of your objects in cases where unexpected machine or application failure prevent the complete sequence of disk writes necessary to keep the database consistent on disk.

Copyright © 2023 Franz Inc., All Rights Reserved | Privacy Statement Twitter