Introduction

The AllegroGraph Exporter (agtool export) is a command-line utility for exporting data from a repository. It can use multiple CPU cores to export in parallel.

The agtool program is the general program for AllegroGraph command-line operations. (In earlier releases, there was a separate agexport program.)

Usage

agtool export [OPTIONS] REPO-SPEC FILE 

where REPO-SPEC identifies an AllegroGraph repository, along with the catalog, the host, and so on, see the REPO SPECs section of the agtool document for details. FILE is a file name.

For example, this command exports the triples from the lesmis repository into a file named lesmis.rdf using the RDF/XML format:

% agtool export --output rdfxml http://user1:my-pw@agmachine/repositories/lesmis lesmis.rdf 

The FILE argument

Note that if you use a dash (-) for the FILE argument, then agtool export will send the data to standard output. Parallel export is not possible in this case.

If exporting in parallel, then the FILE argument is used as a template for the output file names. For example, if exporting with 5 workers to /data/output/lubm.nt, then agtool export will send data to:

Files on Amazon S3

If a file is to be written into Amazon S3, you must call agtool export with AWS authentication on the command line as specified in the section Accessing and operating on files on Amazon S3 in the agtool document. Files on Amazon S3 must be prefaced by s3://, like the following:

s3://bucketname/a/b/c/filename  
 

Options

The following options may be used with agtool export:

Repository options

In earlier releases, information now encoded in the REPO-SPEC argument was provided by the individual options --catalog, --server, --port, --user, and --password. These options are deprecated but are kept for backward compatibility. Using them signals a warning. If any is specified, the value must be the same as specified by the REPO-SPEC argument.

Main options

--save-metadata FILENAME
Save attribute definitions and the static filter to FILENAME. See Triple Attributes for information on attributes.
--blank-node-handling STYLE
Determine how blank nodes are treated when exporting in parallel.

This can be together or distribute. The first places all triples with blank nodes into the same export file whereas the second allows blank nodes to be distributed to multiple files. Note that if blank nodes are distributed, then the import process must be told to treat them as if they all come from the same context (cf. agtool load's job based bulk node strategy). The default is together.

-i IF-EXISTS, --if-exists IF-EXISTS
Controls how agtool export behaves when output files already exist. append If an export file exists, then append the new data to it. overwrite If an export file exists, then delete it and write the new data. fail If an export file exists, then do not export any data.

The default is to fail if any export files exist. Note that when exporting in parallel all of the output files are checked and the if-exists behavior applies to them as a group. I.e., if if-exists is fail then the export will fail if any of the output files already exists.

-o FORMAT, --output FORMAT
Set the output format. This can be one of:

Other options:

--compress
If specified, then the output file or files will be compressed using gzip. The default is to have no compression.
-n, --namespaces
Use namespace abbreviations when exporting (for Turtle and RDF/XML). The default is to not use namespaces.
--parallel
Use multiple output files and export workers (see --workers for greater control). The default is to export to a single file.
--workers COUNT
Specify the number of workers to use when exporting in parallel. The default value depends on the number of CPU cores in the machine doing the export.

Exporting distributed repository data

Distributed repositories (see the Distributed Repositories Setup document) allow data to be distributed over several repositories called shards, typically stored on different AllegroGraph servers.

In order to export data, all distributed repo servers must be up and running. If any is not running the export will fail with an error. The distributed repository is named and while the individual shards have names as well the shards should not be directly accessed by users. It is the distributed repo name which is passed to agtool export. Exporting data from a distributed repository works just as it does for a regular repository and all data from all shards is exported. The resulting file is a normal data file with no indication that it came from a distributed repo. It can be loaded into a distributed repo or a regular repo.

Distributed repositories may have one or more associated knowledge base repos. These are federated with shards when running SPARQL queries on the distributed repo (see here in the Distributed Repositories Setup document). Knowledge base repos are not exported when a distributed repository is exported. Knowledge base repos must be exported separately from the distributed repo.

Notes and examples:

example 1: Export the lubm-50 repository in turtle format in parallel with 15 workers. Several files will be written with names derived from the specified name l50.ttl, such as l50-1.ttl, l50-2.ttl, and so on. Any triples with a blank node will be written to the same file:

agtool export --output turtle --workers 15 http://user1:u1pw@localhost:9002/repositories/lubm-50 /disk1/mydir/DATA/l50.ttl 

example 2: Export the lubm-50 repository in the root catalog on www.example.com:10035 in parallel with blank nodes distributed across multiple files. Because the number of workers is not specified, agtool export will make its own determination based on the number of CPU cores. Any existing output files will be overwritten. The output type is not specified so will be NTriples. Output data will be compressed.

agtool export --if-exists overwrite \  
   --parallel \  
   --blank-node-handling distribute \  
   --compress \  
   http://test:[email protected]/repositories/lubm-50 \  
   /disk1/mydir/DATA/l50.nt.gz  

Footnotes