Introduction

The AllegroGraph Exporter (agexport) is a command-line utility for exporting data from a triple-store. It can use multiple CPU cores to export in parallel.

Usage

agexport [OPTIONS] DBNAME FILE 

where DBNAME is an AllegroGraph triple store name, and FILE is a file name.

For example, this command exports the triples from the lesmis triple-store into a file named lesmis.rdf using the RDF/XML format:

./agexport --port 10035 --output rdfxml lesmis lesmis.rdf 

the FILE argument

Note that if you use a dash (-) for the FILE argument, then agexport will send the data to standard output. Parallel export is not possible in this case.

If exporting in parallel, then the FILE argument is used as a template for the output file names. For example, if exporting with 5 workers to /data/output/lubm.nt, then agexport will send data to:

Options

The following options may be used with agexport:

Triple-store options

-c CATALOG, --catalog CATALOG
Specify the catalog name of the triple-store; If the store is in the root catalog, then either omit this option or use the empty string ("") 1 . The default is to use the root catalog.
--server SERVER
Specify the name of the server where the triple-store resides.
-p PORT, --port PORT
Set this to the front-end port of the server where the triple-store resides. agexport can run either on the server in which the triple-store resides or remotely. If run remotely, then you must also specify a username and password. The default value for the port is 10035.
-u USERNAME, --username USERNAME
Specify a username for the triple-store when accessing it remotely; use with --password
--password PASSWORD
Specify the password for the triple-store when accessing it remotely; use with --username.

Main options

--blank-node-handling STYLE
Determine how blank nodes are treated when exporting in parallel.

This can be together or distribute. The first places all triples with blank nodes into the same export file whereas the second allows blank nodes to be distributed to multiple files. Note that if blank nodes are distributed, then the import process must be told to treat them as if they all come from the same context (cf. agload's job based bulk node strategy). The default is together.

-i IF-EXISTS, --if-exists IF-EXISTS
Controls how agload export behaves when output files already exist. append If an export file exists, then append the new data to it. overwrite If an export file exists, then delete it and write the new data. fail If an export file exists, then do not export any data.

The default is to fail if any export files exist. Note that when exporting in parallel all of the output files are checked and the if-exists behavior applies to them as a group. I.e., if if-exists is fail then the export will fail if any of the output files already exists.

-o FORMAT, --output FORMAT
Set the output format. This can be one of:

Other options:

--compress
If specified, then the output file or files will be compressed using gzip. The default is to have no compression.
-n, --namespaces
Use namespace abbreviations when exporting (for Turtle and RDF/XML). The default is to not use namespaces.
--parallel
Use multiple output files and export workers (see --workers for greater control). The default is to export to a single file.
--workers COUNT
Specify the number of workers to use when exporting in parallel. The default value depends on the number of CPU cores in the machine doing the export.

Notes and examples:

Export the lubm-50 triple-store in turtle format in parallel with 15 workers. Any triples with a blank node will be written to the same file:

./agexport --port 9002 --output turtle --workers 15 lubm-50 /disk1/gwking/DATA/l50.nt 

Export the lubm-50 triple-store on http://www.example.com:10035 in parallel with blank nodes distributed across multiple files. Because the number of workers is not specified, agexport will make its own determination based on the number of CPU cores. Any existing output files will be overwritten. Output data will be compressed.

./agexport --if-exists overwrite --server http://www.example.com \  
   --output rdfxml \  
   --parallel \  
   --blank-node-handling distribute \  
   --compress \  
   lubm-50 /disk1/gwking/DATA/l50.nt.gz  

Footnotes

  1. Note that the root catalog is not named 'root'; rather, it is the catalog with no name.