Introduction
AllegroGraph supports online full backups. A full backup contains all data files and information required to restore a repository to the state it was in at the time of the backup. Backups are performed while the repository is open in a server.
The backup program is used both for backing up data for use later by the same server and for backing up data which will be upgraded to a new (later) version.
The backup utility backs up the data (all the triples) and the server configuration. This makes upgrading straightforward. See below and Repository Upgrading.
Backing up/restoring AllegroGraph releases prior to 4.12.2
Please contact [email protected] if you wish to backup data from a running AllegroGraph release prior to 4.12.2 or restore data backup up from such a release. Various programmatic changes means the procedure is different from what is in this document.
Backing up/restoring AllegroGraph releases prior to 6.2.0
In releases prior to 6.2.0, the program for backing up and restoring data was agraph-backup. Starting in release 6.2.0, it is the agtool archive program. (Most command-line programs have been folded into the single agtool program, whose first argument specifies the utility to run.) The calling template for the older agraph-backup and the new agtool archive are the same. Anywhere in this document where it may say to run agtool archive in a release prior to 6.2.0, run agraph-backup instead.
Nomenclature and repository specification
In AllegroGraph, data are stored in repositories, or repos for short. These are occasionally informally called databases but repository is the correct name. Repositories are organized into catalogs. There is a special, unnamed catalog called the root catalog which can contain repositories.
A repository is identified by a repository specification, abbreviated repo-spec. See the Repository Specification document for a general discussion of repo-specs.
Some agtool archive commands work on remote servers (that is, servers running on different hosts than that running agtool or servers running on the same host but by a different user). Others must be run on the same host as the server and by the same user who started the server. In either case, the repo-spec can contain information on the host running the server, on the port on which the server is listening, and the scheme (http or https) being used. For remote operations, the repo-spec can also specify the username and password. Remote operations can only be made by users with AllegroGraph superuser privileges.
The simplest repo-spec for a repository in the root catalog is
repository-name
The simplest repo-spec for a repository in a named catalog is
catalog-name:repository-name
When a host, port, scheme, and username and password must be specified, the format is:
[username:password@][host][:[port][s]]/[catalog:]repo-name
The 's' after the port encodes the scheme ('s' for https, no 's' for http).
Here are some examples:
test:xyzzy@myhost:12345s/my-cat/my-repo
The server is running on myhost
listening on port 12345 with scheme https. The username is test
and the password is xyzzy
. User test
must have AllegroGraph superuser privileges. The repository is my-repo
in catalog my-cat
.
See the Repository Specification document for many other examples.
The agtool archive program
The single program agtool archive does both backing up and restoring, and a few additional bookkeeping tasks. The general calling template for agtool archive is:
agtool archive [options] command [command-args]
options are prefixed with double dashes (single dash in some cases) and may also take arguments.
The commands which write to or read from files (backup, backup-all, backup-settings, restore, restore-all, and restore-settings) are passed a directory and perhaps additional information like a repo-spec. The specific files are located and named within that directory following standard rules described below. Unless the --supersede
option is specified to backup commands, the archive directory must either not exist or be empty. See below for more information.
The supported commands are as follows:
- backup ground-repo-spec archive-dir
- This command saves the contents of the repository ground-repo-spec to a file in the directory named by archive-dir. See more on this command, including details of the options, below.
- backup-all archive-dir
- This command backs up all repositories in the running AllegroGraph server to the directory archive-dir. It also backs up settings (unless the
--nosettings
option is specified). See below for more on this command, including details of the options. - restore ground-repo-spec archive-dir [archive-repo-spec]
- This command restores the data from archive-repo-spec from the archive archive-dir into ground-repo-spec. If archive-repo-spec is omitted, the archived repository searched for is ground-repo-spec. archive-dir can also be a specific backup file, which will be restored into repo-spec. When archive-dir is a file, archive-repo-name, even if supplied, is ignored. See below for more on this command, including details of the options. Note that the first argument repo-spec need not be the same as the repo-spec in the optional archive-repo-spec argument.
- restore-all archive-dir
- This command restores multiple repositories and settings information previously archived in archive-dir by the backup-all command. See below for more on this command, including details of the options.
- backup-settings archive-dir
- This command backs up the settings information (users, stored queries/procedures, etc.) The settings themselves are copied directly from the filesystem. Knowledge of where the settings reside comes from either a running server or the config file. See below for more on this command, including details of the options.
- restore-settings archive-dir
- This command restores the settings information (users, stored queries/procedures, etc.) stored in the settings subdirectory of archive-dir. If settings are restored, the server must be restarted for them to take effect. See below for more on this command, including details of the options.
- list archive-file
- This command shows the contents of archive-file, which must be a backup file that contains repository data. See below for more details.
- no options or commands specified
- agtool archive called with no options and no command prints help documentation.
Archives stored on Amazon S3
If a file is to be written into or read from Amazon S3, you must call agtool archive with AWS authentication on the command line as specified in the section Accessing and operating on files on Amazon S3 in the agtool document. Files in S3 must be prefaced by s3://
, like the following:
s3://bucketname/a/b/c/filename
Backing up and restoring distributed repository data
Distributed repositories (see the Distributed Repositories Setup document) allow data to be distributed over several repositories called shards, typically stored on different AllegroGraph servers.
In order to backup or restore data, all distributed repo servers must be up and running. If any is not running the backup or restore will fail with an error. The distributed repository is named and while the individual shards have names as well the shards should not be directly accessed by users. It is the distributed repo name which is passed to agtool archive.
Backing up data from a distributed repository works just as it does for a regular repository and all data from all shards is backed up. However, the backed up data includes the fact that it came from a distributed repo with a specific number of shards and so can only be restored to a distributed repo with that same number of shards. The shards can be distributed over servers in a different fashion and the distributed repo name can be different but the data cannot be restored to a regular (non-distributed) repo nor to a distributed repo with fewer or more shards.
If you want to transfer data from a distributed repo to a regular repo or a distributed repo with a different number of shards, export the data using agtool export (see Repository Export) and then import the resulting file with agtool load (see Data Import).
Distributed repositories may have one or more associated knowledge base repos. These are federated with shards when running SPARQL queries on the distributed repo (see here in the Distributed Repositories Setup document). Knowledge base repos are not included in a distributed repository backup. You must back them up separately if desired.
See the section on backing up in the Distributed Repositories Setup document for an example of backing up and restoring.
Backup archive directory structure
Most commands take a directory as a location argument. The <archive-dir> directory has two subdirectories, archives/ and settings/. Here is the structure of the archives/ directory for a repository in a named catalog:
<archive-dir>/archives/<catalog-name>/<repo-name>/
For a repository in the root catalog, the archive directory is
<archive-dir>/archives/root/<repo-name>/
This directory contains repository data for <repo-name> as well as settings specific to that repository. Triples data are stored in a file named
Server settings information is stored in files and subdirectories of
<archive-dir>/settings/
agtool archive can find the files it needs based on the supplied archive directory, catalog name (which defaults to 'root'), and repository name. There is in general no need to refer to specific filenames, although the restore command will accept a file argument instead of a directory.
Note that this structure was new in release 4.12.2. agtool archive knows the directory structures of earlier releases, so upgrading also only requires a directory name, but the information in this document does not apply to earlier releases.
Overwriting files and repositories
If a non-empty directory is given as an argument to the backup or backup-all or backup-settings commands, these commands will fail (with an error message) unless the --supersede
option is specified. If --supersede
is specified, the contents of the <archive-dir> directory will be removed entirely, and then the new backup files will be written. **Warning: you cannot update the backup of a specific repository with the "backup" command or update saved settings with "backup-settings" to an existing archive directory because if you specify --supersede
, all data in the archive directory is deleted.**
Similarly, if you try to restore an existing repository, that restore will fail unless the --supersede
option is specified. When the --supersede
option is specified, repositories that exist on the server but not in the archive will not be removed.
The restore-settings command ignores the --supersede
option because there are always settings, so restoring settings of necessity overwrites existing settings.
agtool archive notes
agtool archive must be run by the user id under which the AllegroGraph server is running (commonly the
agraph
user).The agtool archive program must be run on the same machine as the AllegroGraph server.
There are many options and most apply to only certain commands. Specifying an option not relevant to the command specified is not an error. The irrelevant option will be ignored.
Using agtool archive to upgrade
When you have a new version of AllegroGraph, you can migrate your repositories from the old version to the new using agtool archive. Let us say you are upgrading from AllegroGraph 6.1 to AllegroGraph 7.0.3. The steps are:
Start the AllegroGraph 6.1 server using port P. Choose a directory D (which must not exist) for the backup archive.
su to the user id under which the AllegroGraph server is running (usually the agraph user).
Run the 6.1 version of agraph-backup with the backup-all command (for additional options, see the AllegroGraph 6.1 documentation -- note we call agraph-backup because agtool was not introduced until version 6.2.0):
Stop the AllegroGraph 6.1 server.
Start the 7.0.3 server, using port P1.
Run the 7.0.3 version of agtool archive using port P1 with the restore-all command (see below for the full set of command options):
agraph-backup --port P backup-all D
agtool archive --port P1 restore-all D
Please contact [email protected] if you plan to restore AllegroGraph v4.0 backups into AllegroGraph v4.1 or newer, or any backups from a pre-4.0 version of AllegroGraph.
The backup and backup-all commands
The backup and backup-all commands backup a single repository or all repositories, respectively. backup-all additionally backs up settings unless the --nosettings
option is specified.
Backing up a single repository can be done on a remote server (one running on a different host from that on which agtool is running or by a different user that the one who started the server). Use the connection information part of the ground-repo-spec to specify the host, port, scheme, username, and password as needed. The user must have AllegroGraph superuser privileges.
backup-all must be run on the same host as the server and by user who started the server. The connection information is specified using a SERVER-SPEC (see the SERVER SPECs section of the agtool document), which can be used to specify the port and scheme if necessary.
backup takes a repo-spec and an archive directory as its arguments.
backup-all takes a server-spec and an archive directory as its arguments.
The archive directory must either not exist or be empty unless the --supersede
option is specified. If --supersede
is specified, and the archive directory exists, the archive directory will be cleared of all contents before new data is written to it. Therefore, you cannot update the backup of a single repository to an existing archive directory with backup because other data in the directory will be deleted.
For backup only, <archive-dir> can be -, which means send the output to standard output rather than to a file. See below for details.
With either backup command, repository data archives are written to files named <archive>/archives/<catalog>/<repo-name>/<repo-name>.agbackup. /<catalog> is /root
for repos in the root catalog.
Here are the relevant options. Unless indicated, they apply to either command:
- --port port | -p port
- the port with which to communicate with the running server. This argument is deprecated but still supported. The port should be specified as part of a repo-spec or a server-spec.
- --config config-file
- if supplied, settings location will come from the config file. This argument can be supplied but is not necessary as the config file location is available from the server, which must be running and listening on the specified port.
- --catalog
- This argument is no longer supported. Specify a catalog as part of the repo-spec.
- --supersede
- If specified, an existing archive directory will be emptied (all existing files and subdirectories removed) before its standard directory structure is reestablished, and the desired backup files are written (for a single repository for backup, for all repositories and for settings for backup-all).
- --nosettings
- For backup-all, do not save settings (such as users, roles and stored procedures). This is an unusual option. For backup, do not save settings for the specified repository. This is an unusual option.
So, the template for a call to agtool archive using the backup command is:
agtool archive \
[--supersede] \
backup repo-spec <archive-dir>
See the Repository specification section for the format of repo-specs.
The template using the backup-all command is
agtool archive \
[--supersede] [--nosettings] \
backup-all server-spec <archive-dir>
The restore and restore-all commands
The restore and restore-all commands restore one or more repositories archived with agtool archive with the backup or backup-all commands. restore-all also restores settings if present, unless --nosettings
is specified. If settings are restored, you must restart the server before continuing.
The AllegroGraph server must run on the same host and as the same OS user as the user executing the restore command, otherwise the restore will fail with an error message indicating the user mismatch.
During a restore, a progress report is periodically printed to stdout, showing the fraction of the archive which has been processed and an estimate of the completion time.
The catalog specified by the repo-specs must exist for restore. A repo-spec name with no catalog prefix is taken to be in the root catalog. For restore-all, all archived catalogs must exist (the subdirectories of <archive>/archives/ are catalog names) in the running server. The root catalog always exists.
Here are the relevant options. Unless indicated, they apply to either command:
- --port port | -p port
- the port with which to communicate with the running server.
- --config config-file
- if supplied, settings location will come from the config file. This argument can be supplied but is not necessary as the config file location is available from the server, which must be running and listening on the specified port.
- --nocommit
- When specified, the newly restored repository will be set to no-commit mode. no-commit mode means that the repository will not accept commits. Replicas must be created in no-commit mode. See Replication.
- --newuuid
- When specified, a new uuid will be generated for the repository.
- --recover
- Shorthand for specifying both
--nocommit
and--newuuid
. - --replica
- When specified, the restored repository is assumed to be a replication secondary (warmstandby client). It will cause
--nocommit
to be set. This flag is incompatible with--newuuid
and--recover.
Warm standby is discussed in Replication Details. - --noconvert
- Only relevant when the AllegroGraph version which created the archive is different (earlier) than the one which is restoring the archive. Normally, the restore operation converts such archives to the new version. When
--noconvert
is specified, it does not do that conversion. if you specify this option, you will need to run the agtool upgrade command before accessing the restored repository. - --catalog
- This argument is no longer supported. The respository specification contains the catalog name (no name for the root catalog).
- --nosettings
- With restore-all, when specified, do not restore settings information from the restored archive. (If backup-all was called with the
--nosettings
option, the settings will not be available in any case.) With restore, when specified, do not restore repository settings information. - --supersede
- if specified, existing repositories in the server will be overwritten if backup data exists in <archive> (for the single repository being restored by restore or all repositories for restore-all).
So, the template for a call to agtool archive using the restore command is:
agtool archive \
[--port port | -p port] [--nocommit] [--newuuid] \
[--recover] [--replica] [--supersede] \
restore <repo-spec> <archive-dir> [repo-spec-from-archive]
Note that you can change the repository name and catalog containing it from what it is in the archive to what you want in the server by using the final optional [repo-spec-from-archive]
argument. So if the required <repo-spec> argument is my-cat:my-repo
and no final optional argument is specified, that repo-spec will be looked for in and restored from the archive. But if the final argument is supplied as my-old-cat:my-old-repo
, my-old-repo will be looked for in the my-old-cat subdirectory and that repository will be restored to my-repo in catalog my-cat. (Of course, repo-spec
and repo-spec-from-archive
can be the same, but in that case specifying repo-spec-from-archive
is redundant.) Repository specifications are described above.
For restore, <archive> may also name a backup file or may be - (a dash), which means get the input from standard input rather than from a file. See below for details. If <archive> is a file or -, <repo-spec-from-archive> is ignored even if supplied.
And the template using the restore-all command is
agtool archive \
[--port port | -p port] [--nocommit] [--newuuid] \
[--recover] [--replica] [--nosettings] [--supersede] \
restore-all <archive-dir>
The archive-dir argument for restore and restore-all should name a directory created by the backup command or the backup-all command. (Directories created by earlier AllegroGraph version will also work.)
For restore-all the repo names and catalogs will be taken from the various archive files. All catalogs must exist in the server before their repositories can be restored.
Here are some restore examples:
agtool archive [options] restore my-repo archive-dir
This command will search *archive-dir* for data in the my-repo repository
in the root catalog and restore it. (We are assuming *my-repo* is
a [repo-spec](#repo-spec) without a catalog specified.)
agtool archive [options] restore my-repo agbackup-file
This command restores data from the file *agbackup-file* and puts it
into the my-repo repository in the root catalog. (We are assuming *my-repo* is
a [repo-spec](#repo-spec) without a catalog specified.)
agtool archive [options] restore my-repo-spec agbackup-file
This command restores data from the file *agbackup-file* and puts it
into the catalog and repository specified by [my-repo-spec](#repo-spec).
agtool archive [options] restore my-new-repo-spec archive-dir my-old-repo-spec
This command finds the stored data for *my-old-repo-spec* in *archive-dir*
and restores it into the catalog and repository specified
by *my-new-repo-spec*.
The backup-settings and restore-settings commands
The backup-settings command will store settings (users, stored queries/procedures, etc.) information. The <archive> directory must either not exist or be empty unless the --supersede
option is specified, in which case all files and subdirectories of an existing directory will be deleted before the settings information is written. Therefore, you cannot specify an existing backup archive directory and just have the settings superseded. Settings are written to the settings/ subdirectory of <archive>. You can manually replace the settings/ subdirectory of one backup archive with a different settings/ subdirectory if you want to change the saved settings in an archive.
The options are:
- --config
- must be the path of an AllegroGraph config file. If specified (and no value is specified for
--port
), agtool archive will look in the config file for the location on disk of settings information. The server need not be running. - --port port | -p port
- the port with which to communicate with the running server. Not needed if the optional config-file argument is specified. If both are specified, the port takes precedence.
- --supersede
- if specified, the <archive> directory, if it exists and is non-empty, will be cleared (all subdirectories and files deleted) before the standard subdirectory structure is reestablished and settings information is written out. (Do not specify
--supersede
if you just want to update settings information in an existing backup archive as doing so will cause all other data to be deleted.)
The call template thus is:
agtool archive \
[--port port ] [--config <config-file>] [--supersede] \
backup-settings <archive>
The restore-settings command replaces settings in a running server. It takes an <archive> argument and finds the settings data in that file and replaces the existing stored settings of the running server with the new settings. (If <archive> does not have any settings data, no change is made.) The --supersede
option is ignored since there are always settings, so restoring settings to a running server of necessity supersedes the existing ones.
You must restart the server for the new settings to take effect.
The call template thus is:
agtool archive [--port port ] restore-settings <archive>
The list command
The list command displays information about the contents of an archive file, that is a single dbase.agbackup file, or an archive directory, that is a directory containing such files. The structure of an archive directory is
<archive>/archives/<catalog-name>/<repo-name>/
The backup files have names and types
<repo-name>.agbackup
The call template for a single file is:
agtool archive list <archive-file>
The call template for an archive directory is:
agtool archive [options] list <archive-directory> [repo-in-archive]
If repo-in-archive
is supplied, it must be a repo-spec and only information on that repository will be displayed (just as if its single backup file had been specified).
Only one list option is available when calling agtool archive list on an archive directory:
- --summary
- if supplied, repositories present in the archive will be listed without further details.
agtool archive list examples
A user has the following subdirectories of the user's home directory, and each has a backup file:
~/db-archive/archives/cat1/my-db-1/my-db-1.backup
~/db-archives/archives/cat2/my-db-2/my-db-2.backup
~/db-archives/archives/cat2/my-db-3/my-db-3.backup
~/db-archives/archives/root/my-db-4/my-db-4.backup
The <archive> directory is ~/db-archive/. There are two named catalogs, cat1 and cat2. cat1 contains one repository named my-db-1. cat2 contains two repositories named my-db-2 and my-db-3. There is also a repository in the root catalog named my-db-4.
Here are some sample list commands. We have pared the output down by removing some information to make the examples more readable. It is the commands and the general look of the output that we are showing rather than the actual repository information.
The old style listing of a single backup file:
$ agtool archive list ~/db-archive/archives/cat1/my-db-1/my-db-1.backup
Triple store : cat1:my-db-1
UUID : b9c3592c-9c83-1814-b8a9-61ec65c4b3b0
AllegroGraph Server version : 6.4.2
Database format version : 41
Backup format version : 1
Backed up at 2018-04-23 13:56:57
9 files:
tlog-b9c3592c-9c83-1814-b8a9-61ec65c4b3b0-1 (~/agtmp/cat1/my-db-1/tlog-b9c3592c-9c83-1814-b8a9-61ec65c4b3b0-1) 1351680 bytes
ckpt (~/agtmp/cat1/my-db-1/ckpt) 8 bytes
tlmgr (~/agtmp/cat1/my-db-1/tlmgr) 108 bytes
version (~/agtmp/cat1/my-db-1/version) 4 bytes
parameters.dat (~/agtmp/cat1/my-db-1/parameters.dat) 602 bytes
uuid (~/agtmp/cat1/my-db-1/uuid) 36 bytes
sstab-strings (~/agtmp/cat1/my-db-1/sstab-strings) 4194304 bytes
sstab-large-strings (~/agtmp/cat1/my-db-1/sstab-large-strings) 4194304 bytes
sstab-chunk (~/agtmp/cat1/my-db-1/sstab-chunk) 1572864 bytes
reading archive file: 10.8 MB/s
The same information specifying the archive directory and using the optional repo-in-archive argument:
$ agtool archive list ~/db-archive/ cat1:my-db-1
Triple store : cat1:my-db-1
UUID : b9c3592c-9c83-1814-b8a9-61ec65c4b3b0
AllegroGraph Server version : 6.4.2
Database format version : 41
[etc. -- same as above]
Command listing all repositories in the archive:
agtool archive list ~/backups/ag-archives
Triple store : cat1:my-db-1
[...]
Triple store : cat2:my-db-2
[...]
Triple store : cat2:my-db-3
[...]
Triple store : my-db-4
[...]
Command providing a summary of the repositories in the archive:
agtool archive --summary list ~/backups/ag-archives
cat1:my-db-1
cat2:my-db-2
cat2:my-db-3
my-db-4
Reading from standard input/ writing to standard output
If you specify - (a dash) in place of an archive with the backup and restore commands, the backup command will write to standard output and the restore command will read from standard input. This is useful when you wish to copy a repository, as we describe in the next section, and also for streaming backups over a network.
Using agtool archive to copy repositories
agtool archive can be used to copy a repository to a new name and/or catalog. This is achieved by setting up a pipe with agtool archive backup writing the repository to standard output, and agtool archive restore reading the backup archive into a different repository. For example, to make a copy of the repository lubm1 under the name lubm1 in the catalog experiments, the following command line could be used:
agtool archive backup lubm1 - \
| agtool archive --newuuid restore experiments:lubm1 -
Note that a triple-store copy needs a --newuuid if it is to run on the same server as the original triple store. Alternately, you could specify the --port number to send the copy to a different server on the same computer. In that case the --newuuid isn't necessary.
Restoring a Pre-v4.1 backup
AllegroGraph versions prior to v4.1 used a different backup mechanism. Please contact [email protected] if you require assistance restoring v4.0 backups into AllegroGraph v4.1 or newer.
agtool archive help
Calling agtool archive --help displays help information.