Introduction
The AllegroGraph Exporter (agexport) is a command-line utility for exporting data from a triple-store. It can use multiple CPU cores to export in parallel.
Usage
agexport [OPTIONS] DBNAME FILE where DBNAME is an AllegroGraph triple store name, and FILE is a file name.
For example, this command exports the triples from the lesmis triple-store into a file named lesmis.rdf using the RDF/XML format:
./agexport --port 10035 --output rdfxml lesmis lesmis.rdf the FILE argument
Note that if you use a dash (-) for the FILE argument, then agexport will send the data to standard output. Parallel export is not possible in this case.
If exporting in parallel, then the FILE argument is used as a template for the output file names. For example, if exporting with 5 workers to /data/output/lubm.nt, then agexport will send data to:
- /data/output/lubm-0.nt
- /data/output/lubm-1.nt
- /data/output/lubm-2.nt
- /data/output/lubm-3.nt
- /data/output/lubm-4.nt
Options
The following options may be used with agexport:
Triple-store options
- -c CATALOG, --catalog CATALOG
- Specify the
catalogname of the triple-store; If the store is in the root catalog, then either omit this option or use the empty string ("") 1 . The default is to use the root catalog. - --server SERVER
- Specify the name of the server where the triple-store resides.
- -p PORT, --port PORT
- Set this to the front-end port of the server where the triple-store resides.
agexportcan run either on the server in which the triple-store resides or remotely. If run remotely, then you must also specify ausernameandpassword. The default value for the port is 10035. - -u USERNAME, --username USERNAME
- Specify a username for the triple-store when accessing it remotely; use with
--password - --password PASSWORD
- Specify the password for the triple-store when accessing it remotely; use with
--username.
Main options
- --blank-node-handling STYLE
- Determine how blank nodes are treated when exporting in parallel.
This can be together or distribute. The first places all triples with blank nodes into the same export file whereas the second allows blank nodes to be distributed to multiple files. Note that if blank nodes are distributed, then the import process must be told to treat them as if they all come from the same context (cf. agload's job based bulk node strategy). The default is together.
- -i IF-EXISTS, --if-exists IF-EXISTS
- Controls how agload export behaves when output files already exist. append If an export file exists, then append the new data to it. overwrite If an export file exists, then delete it and write the new data. fail If an export file exists, then do not export any data.
The default is to fail if any export files exist. Note that when exporting in parallel all of the output files are checked and the
if-existsbehavior applies to them as a group. I.e., ifif-existsis fail then the export will fail if any of the output files already exists. - -o FORMAT, --output FORMAT
- Set the output format. This can be one of:
Other options:
- --compress
- If specified, then the output file or files will be compressed using gzip. The default is to have no compression.
- -n, --namespaces
- Use namespace abbreviations when exporting (for Turtle and RDF/XML). The default is to not use namespaces.
- --parallel
- Use multiple output files and export workers (see
--workersfor greater control). The default is to export to a single file. - --workers COUNT
- Specify the number of workers to use when exporting in parallel. The default value depends on the number of CPU cores in the machine doing the export.
Notes and examples:
Export the lubm-50 triple-store in turtle format in parallel with 15 workers. Any triples with a blank node will be written to the same file:
./agexport --port 9002 --output turtle --workers 15 lubm-50 /disk1/gwking/DATA/l50.nt Export the lubm-50 triple-store on http://www.example.com:10035 in parallel with blank nodes distributed across multiple files. Because the number of workers is not specified, agexport will make its own determination based on the number of CPU cores. Any existing output files will be overwritten. Output data will be compressed.
./agexport --if-exists overwrite --server http://www.example.com \
--output rdfxml \
--parallel \
--blank-node-handling distribute \
--compress \
lubm-50 /disk1/gwking/DATA/l50.nt.gz
Footnotes
-
Note that the
rootcatalog is not named 'root'; rather, it is the catalog with no name. ↩