Introduction

Beginning in version 8.0 AllegroGraph added a second distributed triple store implementation. The new version, called Fedshard Triple Store, is a more flexible than the original version. With Fedshard you can define distributed repositories while the server is running. You can also split shards that have gotten too large.

Fedshard is described in New Dynamic Cluster Setup and New Dynamic Cluster Tutorial.

In AllegroGraph version 9.0 the original distributed triple store implementation has been removed. In this document we describe how to convert from the old distributed triple store to the new Fedshard triple store.

The conversion has four parts

  1. export all triples into a text file
  2. write a Fedshard definition of the repository
  3. upload the Fedshard definition to the server
  4. load the exported triples from step 1 into the Fedshard repository.

You can't migrate a distribute repository using backup/restore due to the different way the shard repositories are named and the way that the triples are divided between shards. You must reload the repository from a text file listing all the triples.

Exporting triples

You can use agtool export to export the triples to a text file. You have to do this from a version of AllegroGraph before version 9.0 since the old style distributed repository cannot be opened in version 9.0 of Allegrograph. It's very likely that if you have a distributed repository that you are actually storing quads in the repository, so you'll want to do something like this:

agtool export -o nquads myrepo myrepo.nq 

If the repo is large you'll want to use multiple workers when doing the export

agtool export -o nquads --workers 10 myrepo myrepo.nq 

in which case the triples are written to files myrepo-1.nq, myrepo-2.nq, etc and you can combine the output files with

cat myrepo-*.nq > myrepo.nq 

Write a Fedshard definition file

The repositories in the old distributed triple store implementation were defined in the agcluster.cfg file (in the lib directory of the server). You've got a text file containing all the triples. You can choose to just load them into a single repository and not make the repository distributed. However if you want to continue using a distributed repository you will need to define a Fedshard repository which could be the same as your previous design or something different.

Next we show two examples of writing a Fedshard definition that matches the design of the old distributed triple store.

Example 1

Here is a simple definition of a distributed repo with three shards:

User     test  
Password xyzzy  
Catalog tests  
Port 10035  
Scheme http  
server shardmachine.com  local  
 
db myrepo  
  key part graph  
  server local  
  shardsPerServer 2 

The Fedshard equivalent definition is:

fedshard  
  repo myrepo  
  key part  
  secondary-key graph  
  shards-per-server 3  
  scheme http  
  port 10035  
  user test  
  password xyzzy  
 
server  
  host shardmachine.com  
  catalog tests  
 
 

Example 2

One can federate each shard with one or more normal repositories we call knowledgebases. The reason for this is to supply triples that would be could be used by Sparql queries over the shard data.

Here is a definition of a distributed repo with knowledgebase (assuming all the top level directives from Example 1:

db myrepo-with-kb  
  key part graph  
  server local  
  shardsPerServer 3  
  kb tests:distdb-kb 

and here is the Fedshard definition

fedshard  
  repo myrepo-with-kb  
  key part  
  secondary-key graph  
  shards-per-server 3  
  scheme http  
  port 10035  
  user test  
  password xyzzy  
 
server  
  host 127.0.0.1  
  catalog tests  
 
kb  
  repo myrepo-kb  
  catalog tests  
  host 127.0.0.1  

Upload the Fedshard repo definition to the server

The Fedshard definition is put in a file (e.g. myrepo.def) and then sent to the agraph server (e.g. mymachine:10077) using

agtool fedshard define --server username:password@mymachine:10077 myrepo.def 

In the old distributed triple store implementation the distributed repo could be in any catalog. The Fedshard implementation puts all Fedshard repos in the fedshard catalog. The shard repositories are put in the catalog specified by the server declaration.

Loading triples into the new Fedshard repo

To load our quads into this repo we would do the following. Note that the name of the Fedshard repo is fedshard:myrepo since all Fedshard repos are in the fedshard catalog.

agtool load username:password@mymachine:10077/fedshard:myrepo myrepo.nq