Introduction

This document is a continuation of the Replication document. This document gives specific instructions on how to initiate database replication.

Replication is the process by which one or more databases can be kept in sync with a single master database. We refer to the master database as the primary database and the replicants as the secondary databases.

High availability refers to the ability to switch between the primary and secondary at will, so if the primary is suddenly unavailable (for whatever reason), the secondary can promoted to become the new primary.

The primary database can handle normal read and write operations while it is acting as the replication master. The secondary databases can only handle read operations from clients while they are replicating the master. (More precisely, the clients are free to add and delete triples but they cannot commit these changes to the secondary databases during replication.)

Replication occurs across the network so any set of Allegrograph servers connected by a network can participate in replication.

Replication occurs in real time. As commits are made to the primary database they are sent as soon as possible to the secondaries.

Replication can only be done between two instances of the same database. The primary will have at least the same commits as the replica and possibly more. The agraph backup program helps one create databases that can be used as secondaries.

The ReplicationPorts configuration option (see Top-level directives in Server Configuration and Control) allows specification of a range of port numbers from which the primary will select listening port numbers for incoming connections from replicas. If the option is specified and none of the ports in the specified range is available, the request to establish a replica will fail. If ReplicationPorts is not specified, the port number will be chosen by the OS.

UUIDs

Every database is assigned a uuid (Universally Unique IDentifier) when it is created. It is a string like de7f021d-b191-99f4-0181-001517d76b50. This is like a fingerprint for the database and can be used to identify it even if the database name changes. The uuid is stored in the file uuid in the database directory. The uuid is important for replication and for point-in-time recovery (see Point-in-Time Recovery).

Transaction Logs

Transaction logs record important information about database state. The state of the database is maintained persistently using files on one or more disks. A commit changes the database from one state to another. A commit will likely involve changing two or more files and this means that there will be a period of time during which one file was updated and another is yet to be updated. If the machine crashes at this point the database would be left in an inconsistent state.

Therefore AllegroGraph stores the changes it would make to the files in the transaction log first and then it updates the files. Thus if the machine crashes during the file update, AllegroGraph can look in the transaction log for the set of steps still needed to be done to complete the state change for the commit.

A further optimization is that the database files are not updated on each commit. Instead the commit is only reflected in the transaction log and the in-memory copy of the file data. Periodically an operation called a checkpoint is done. A checkpoint updates all the database files on disk and writes a record in the transaction log to note that it has done this.

While a database is active, transaction logs are written and never read (except they are read when doing replication which we'll describe later).

When a database is opened the most recent transaction logs are read to ensure that all commits after the last checkpoint have been applied to the database files. If a database is closed normally then a checkpoint is the last operation performed so there will be no commits after the checkpoint.

What one can conclude from this is that if you're not interested in replication or point in time recovery then you can safely get rid of most of the transaction log files that accumulate on your disk. AllegroGraph provides an automatic way to removing or archiving unneeded transaction logs using a process called the Transaction Log Archiver. There are more details on how to configure it in the Transaction Log Archiving document.

Transaction logs are named "tlog-uuid-N" where uuid is the uuid of the database and N is a number starting with 0 and incrementing each time Allegrograph moves to a new transaction log.

Replication

As commits are done, a database moves from one state to the next. If you have two copies of the same database, one after commit 10 was done and one after commit 20 was done, then replication allows you to move the first database from commit 10 to commit 20 using the transaction logs of the second database. Further, as more commits are added to the second database, replication will cause the first database to see the effect of those commits in its state as well.

There are two requirements for replication:

  1. You can only replicate to the same database. This means that the database uuids must match.

  2. The transaction logs must be available that contain the commit records needed to do the replication.

The steps for replication are as follows.

Assume we have two AllegroGraph servers, which we refer to as prime-server and second-server (servers do not actually have names -- their host and port specify them, but we are using names to make this example clearer). prime-server is running on machine prime-host and listening at port 20000, and second-server is running on machine second-host and listening at port 30000. prime-server has a repository Sales that we wish to replicate.

We first register a replication job for prime-server:

% agtool replicate \  
  --primary-host prime-host --primary-port 20000 \  
  --user USER --password PW \  
  --name Sales  
  --jobname repl-1 \  
  --register 

The --uuid argument could have been used instead of the --name argument. Note we do not specify the secondary server at this time.

This tells the system that all Sales transaction log files recording commits subsequent to the registration of replication job repl-1 must be kept around until that replication job indicates it is done with them.

We wish to replicate the Sales repository on second-server.

We make a backup of the Sales repository. This backup must be done after the replication job repl-1 is registered. Otherwise some commits after the backup but before the registration may be lost. On machine prime-host do:

%  agtool archive --port 20000 backup Sales <sales-dir> 

<sales-dir> must be the path of a non-existent or empty directory. You can do the backup while the Sales database is in use.

Now restore that backup to second-server:

% agtool archive --port 30000 --replica restore Sales.sec <sales-dir> 

Here we have chosen to call the restored database Sales.sec instead of Sales just to illustrate when we use the name on the secondary machine and when we use the name on the primary machine.

We pass the --replica argument to ensure that no processes open this database and modify it before we have a chance to start replication.

Warning! Failure to restore the database in --replica mode can cause database corruption if any operations are performed on the secondary before or during replication.

Finally, we set up second-server as the repl-1 replica of Sales:

agtool replicate \  
  --primary-host prime-host --primary-port 20000 \  
  --secondary-host second-host --secondary-port 30000 \  
  --name Sales  
  --user USER --password PW \  
  --jobname repl-1  

Once replication starts it continues to run forever. Should either or both machines (prime-host or second-host) go down the replication will continue when the machines and their AllegroGraph servers are again running. To stop replication, run agtool replicate again and pass it the --stop argument. You can also stop replication using the AGWebView browser interface.

agtool replicate will mark the Sales.sec repository on second-server as no-commit so that no changes other than from replication will be made to this repository. This is done because if such changes were permitted, then second-server's repository would no longer be a copy of prime-server's repository.

At this point we could also restore <sales-dir> on another server (third-server) and do replication from prime-server to third-server using the same steps shown above, except with a different jobname.

High Availability

With replication running as prepared above you can change the roles of prime-server and second-server, thus bringing second-server online as a read/write repository. This only works if there is exactly one secondary replicating the primary.

% agtool replicate  --primary-host prime-host  --primary-port 20000 \  
                    --secondary-host second-host --secondary-port 30000 \  
                    --user USER --password PW  \  
                    --name Sales --switch-roles --become-client  

Note that the primary and secondary arguments are the same as when we started the replication. In this case the command is sent to the primary machine so we use the database name on the primary.

--switch-roles causes AllegroGraph to put the prime-server repository in no-commit mode. Then all commits that the prime-server has not sent yet are sent to the secondary so that it is totally up to date before the switch. Then the role switch occurs and the no-commit flag is removed from the Sales.sec database on second-server.

--become-client tells AllegroGraph on prime-server to start replicating from Sales.sec on second-server. If --become-client is not passed in, then second-server will still become a read/write server for Sales.sec but prime-server will not attempt to follow commits made on second-server.

The agtool replicate program will initiate the role switch and will exit right away, before the role switch has been completed. However, a role switch can take some time (ten seconds or more) depending how many outstanding commits are still to be sent. During this time, both databases will be in no-commit mode. A process that attempts to commit during this period will receive an error message saying that commits are not possible at this time.

Now if we wish to switch the roles back to their original state with prime-server being the primary and second-server the secondary we issue the same command but must change the primary-host and secondary-host to reflect the pre-switch state.

% agtool replicate  --primary-host second-host  --primary-port 30000 \  
                    --secondary-host prime-host --secondary-port 20000 \  
                    --user USER --password PW  \  
                    --name Sales.sec --switch-roles --become-client  
 

Command Reference

The replication program is one of the agtool utilities. The agtool program is the general program for AllegroGraph command-line operations. (In earlier releases, there was a separate agraph-replicate program. A change from earlier releases is the -u is a short synonym for --user and -c is a short synonym for --catalog.)

agtool replicate [--primary-host host] [--primary-port port]  
[--secondary-host host] [--secondary-port port]  
[--catalog|-c cat] [--name name] [--uuid uuid]  
[--user|-u user] [--password password]  
[--jobname jobname] [--stop]  
[--status] [--switch-roles] [--become-client]  
[--list] 

agtool replication can start and stops replication, switch primary and secondary roles, and get the status from the primary and secondary.

Databases can be named by catalog and name or by their uuid.

The primary-port and secondary-port arguments default to 10035. The primary-host and secondary-host arguments do not have defaults and must be explictily supplied when needed.

The jobname names this particular replication job. This is important for transaction log archiving as described below. If a jobname argument is not passed in but one is needed, then one will be created.

Replication is started if neither the --stop, --status nor --switch-roles arguments are passed.

If --stop is passed in then replication will stop. For stopping replication only these arguments need be provided:

agtool replicate  
  [--secondary-host host] [--secondary-port port]  
  [--catalog|-c cat] [--name name] [--uuid uuid]  
  [--user|-u user] [--password password]  
  --stop 

For a status report pass the --status argument. You will see status from both the primary and secondary sides.

--switch-roles causes AllegroGraph to put the primary database in no-commit mode. Then all commits that the secondary has not seen yet are sent to the secondary so that it is totally up to date before the switch. Then the role switch occurs and the no-commit flag is removed from the former secondary database.

--list causes the list of jobnames associated with the specified server(s) to be printed. The --primary-host/primary-port and --secondary-host/secondary-port arguments specify servers, so associated jobnames will be printed for whichever one is specified, or for both if both are specified. For example:

$ agtool replicate \  
     --primary-host localhost --primary-port 10443 \  
     --name test --user  test --password xyzzy \  
     --list  
Jobnames on primary:  
replica-1  
replica-2