Introduction
The AllegroGraph Exporter (agtool export) is a command-line utility for exporting data from a repository. It can use multiple CPU cores to export in parallel.
The agtool program is the general program for AllegroGraph command-line operations. (In earlier releases, there was a separate agexport program.)
Usage
agtool export [OPTIONS] REPO-SPEC FILE
where REPO-SPEC
identifies an AllegroGraph repository, along with the catalog, the host, and so on, see the REPO SPECs section of the agtool document for details. FILE
is a file name.
For example, this command exports the triples from the lesmis
repository into a file named lesmis.rdf
using the RDF/XML format:
% agtool export --output-format rdfxml http://user1:my-pw@agmachine/repositories/lesmis lesmis.rdf
The FILE argument
Note that if you use a dash (-
) for the FILE
argument, then agtool export
will send the data to standard output. Parallel export is not possible in this case.
If exporting in parallel, then the FILE
argument is used as a template for the output file names. For example, if exporting with 5 workers to /data/output/lubm.nt
, then agtool export
will send data to:
- /data/output/lubm-0.nt
- /data/output/lubm-1.nt
- /data/output/lubm-2.nt
- /data/output/lubm-3.nt
- /data/output/lubm-4.nt
Files on Amazon S3
If a file is to be written into Amazon S3, you must call agtool export with AWS authentication on the command line as specified in the section Accessing and operating on files on Amazon S3 in the agtool document. Files on Amazon S3 must be prefaced by s3://
, like the following:
s3://bucketname/a/b/c/filename
Options
The following options may be used with agtool export:
Repository options
In earlier releases, information now encoded in the REPO-SPEC argument was provided by the individual options --catalog
, --server
, --port
, --user
, and --password
. These options are deprecated but are kept for backward compatibility. Using them signals a warning. If any is specified, the value must be the same as specified by the REPO-SPEC argument.
Main options
- --save-metadata FILENAME
- Save attribute definitions and the static filter to FILENAME. See Triple Attributes for information on attributes.
- --blank-node-handling STYLE
- Determine how blank nodes are treated when exporting in parallel.
This can be together or distribute. The first places all triples with blank nodes into the same export file whereas the second allows blank nodes to be distributed to multiple files. Note that if blank nodes are distributed, then the import process must be told to treat them as if they all come from the same context (cf. agtool load's job based bulk node strategy). The default is together.
- -i IF-EXISTS, --if-exists IF-EXISTS
- Controls how agtool export behaves when output files already exist. append If an export file exists, then append the new data to it. overwrite If an export file exists, then delete it and write the new data. fail If an export file exists, then do not export any data.
The default is to fail if any export files exist. Note that when exporting in parallel all of the output files are checked and the
if-exists
behavior applies to them as a group. I.e., ifif-exists
is fail then the export will fail if any of the output files already exists. - -o FORMAT, --output-format FORMAT
- Set the output format. This can be one of:
Other options:
- --compress
- If specified, then the output file or files will be compressed using gzip. The default is to have no compression.
- -n, --namespaces
- Use namespace abbreviations when exporting (for Turtle and RDF/XML). The default is to not use namespaces.
- --parallel
- Use multiple output files and export workers (see
--workers
for greater control). The default is to export to a single file. - --workers COUNT
- Specify the number of workers to use when exporting in parallel. The default value depends on the number of CPU cores in the machine doing the export.
Exporting distributed repository data
Distributed repositories (see the Distributed Repositories Setup document) allow data to be distributed over several repositories called shards, typically stored on different AllegroGraph servers.
In order to export data, all distributed repo servers must be up and running. If any is not running the export will fail with an error. The distributed repository is named and while the individual shards have names as well the shards should not be directly accessed by users. It is the distributed repo name which is passed to agtool export. Exporting data from a distributed repository works just as it does for a regular repository and all data from all shards is exported. The resulting file is a normal data file with no indication that it came from a distributed repo. It can be loaded into a distributed repo or a regular repo.
Distributed repositories may have one or more associated knowledge base repos. These are federated with shards when running SPARQL queries on the distributed repo (see here in the Distributed Repositories Setup document). Knowledge base repos are not exported when a distributed repository is exported. Knowledge base repos must be exported separately from the distributed repo.
Notes and examples:
example 1: Export the lubm-50
repository in turtle
format in parallel with 15 workers. Several files will be written with names derived from the specified name l50.ttl
, such as l50-1.ttl
, l50-2.ttl
, and so on. Any triples with a blank node will be written to the same file:
agtool export --output-format turtle --workers 15 http://user1:u1pw@localhost:9002/repositories/lubm-50 /disk1/mydir/DATA/l50.ttl
example 2: Export the lubm-50
repository in the root catalog on www.example.com:10035
in parallel with blank nodes distributed across multiple files. Because the number of workers is not specified, agtool export will make its own determination based on the number of CPU cores. Any existing output files will be overwritten. The output type is not specified so will be NTriples. Output data will be compressed.
agtool export --if-exists overwrite \
--parallel \
--blank-node-handling distribute \
--compress \
http://test:[email protected]/repositories/lubm-50 \
/disk1/mydir/DATA/l50.nt.gz