Table of Contents

Top-level directives

Catalog definitions

Catalog directives

Example

Changing database parameters

The AllegroGraph server requires a configuration file in order to start up. Usually, this file is specified using the --config command-line argument. A minimal file could look like this:

SettingsDirectory /tmp/ag4/settings  
SuperUser test:xyzzy  
 
<RootCatalog>  
  Main /tmp/ag4/root  
</RootCatalog> 

An AllegroGraph configuration file consists of a set of top-level directives, and one or more catalog definitions. The syntax is straight-forward: directives (both top-level and within a catalog definition) consist of an alphanumeric word, whitespace, and then the value of the directive. Indentation is ignored. A directive can span multiple lines by escaping newlines with a back-slash, in which case both the newline and the backslash will be treated as if they are not there. Catalog definitions are delimited by pseudo-XML markers like <RootCatalog>. Lines starting with a # are treated as comments.

The directories named in this file must either already exist and be writeable by the user running the server, or that user must be able to create them himself. You'll usually not want to use temporary paths as in the example, of course.

Parameter values must not be quoted. Spaces are not allowed in parameter values.

Top-level directives

SettingsDirectory
Required setting. Specifies the directory in which the server stores persistent information such as user accounts.
Port
Should be an integer. Used to set the port on which the daemon runs its HTTP server. When not given, this defaults to 10035.
HostName
Determines the host on which the HTTP server listens. Can be left out to have the server listen on all interfaces. Set to localhost to listen only locally.
AllowHTTP
A boolean (yes/no) that can be used to turn off HTTP access to the server. Default is yes.
HTTPProxy
This parameter can be used to make HTTP requests made by the server (for example, when a SPARQL query loads data from an external URL) go through a proxy. Valid values are hostname.net, hostname.net:8888, or, when the proxy requires authentication, user:[email protected]:8888.
SSLPort
An integer. If given, an SSL HTTP server will be run on this port.
SSLCertificate
When an SSLPort is given, this should be used to point at a file containing a server certificate and private key, PEM-encoded.
SuperUser
If given, should be a string in name:password format. The server will ensure, on startup, that a superuser with this name and password exists. Note that this means anyone that can read your configuration file has full access to the server. It is recommended to use the server setup script to create a superuser instead, or if you do use this directive, remove it after the first run of the server has created the user.
Backends
Specifies the maximum number of processes spawned to handle HTTP requests (note that session processes do not count towards this limit). Default is 3.
SessionPorts
If given, should be an integer range like 8000-8020. Defines the ports that will be used for sessions. Useful when these need to be opened in a firewall or similar. When not specified, random ports will be used.
LogDir
Specifies the directory where the server log file is written.
TempDir
Specifies the directory in which AllegroGraph may create temporary files. Defaults to the system's designated temp dir (typically /tmp).
RunAs
Have the server, if started by root, run as the given user instead (defaults to agraph).
PidFile
A file to which the server writes out its process id.
SPARQLBaseURL
If given, the HTTP server will use this value as the base-url when parsing SPARQL queries. When not given, the URL of the request is used instead.
QueryEngine
Specifies the query processor used for SPARQL queries. This parameter can be overridden for specific queries (in an API specific manner). See the discussion in the SPARQL documentation for more details on the available engines and on how to choose amongst them.

Catalog definitions

Catalogs are locations on disk where AllegroGraph keeps its triple-stores. These locations are specified in the configuration file, along with some optional default settings for stores in the catalogs. Most of the time, you will want to specify all catalogs directly in the configuration file, but it is also possible to enable dynamic catalogs, which can be created and deleted through the HTTP interface.

Catalog definitions in the server configuration files serve as templates for creating databases. The parameters defined in the catalog definition will be copied to the database when it is created. Changes to the catalog definition do not influence the settings of existing databases. In order to modify parameters of existing databases, the file 'parameters.dat' in the database 'Main' directory must be edited and the database be restarted.

There are three types of catalog definitions that can occur in an AllegroGraph configuration file: a root catalog, named catalogs, and a dynamic catalog specification. The first was seen in the example above (<RootCatalog> ... </RootCatalog>), and is used to determine where stores live that do not have a catalog specified. Named catalogs look similar:

<Catalog temporary>  
  Main /tmp/catalog  
</Catalog> 

Their opening line specifies their name, which can contain any characters except slashes, backslashes, colons, and tildes. This name can then be used as catalog name when creating or accessing triple-stores.

Finally, a dynamic catalog definition is used to provide the settings for catalogs created over HTTP. If no dynamic catalog is defined, this feature is disabled.

<DynamicCatalogs>  
  Main /tmp/dynamic  
</DynamicCatalogs> 

The directory (as well as any other catalog directories, see below) given for dynamic catalogs will be extended with a catalog name when such a catalog is created. For example, given the above configuration, a dynamic catalog named scratch would end up in /tmp/dynamic/scratch.

Catalog directives

Some of the directives allowed within a catalog definition (those marked as inheritable) can also be specified at the top-level, where they act as a default value inherited by catalogs which don't explicitly specify that setting.

Main
Required for every catalog. Specifies the directory in which the triple-stores for the catalog are stored.
TransactionLogDir

Specifies the directory in which transaction log subdirectories will be created for triple-stores in this catalog. The directory will be extended with the name of a triple-store. For example, if TransactionLogDir is /tmp/tlogs, then transaction logs for triple-store example will be stored in /tmp/tlogs/example. This parameter is optional and defaults to the value supplied for the Main parameter.

See the line in the example below

 TransactionLogDir /mnt/disk3/ag4-transaction-logs 
which says transaction logs should be placed in the /mnt/disk3/ag4-transaction-logs/[triple-store-name]/ directory.
StringTableDir
Specifies the directory in which string table subdirectories will be created for triple-stores in this catalog. See TransactionLogDir for information on how directory names are constructed. This parameter is optional and defaults to the value supplied for the Main parameter.
FilePlacementRule

Specifies additional rules that control file placement. Takes two arguments, a regular expression and a directory root where AllegroGraph puts files whose names match the regular expression. Both parameters must not be quoted. This is an optional parameter.

This entry, for example, says put files whose names begin with index-posgi in the directory /mnt/disk6/ag4-posgi/.

FilePlacementRule ^index-posgi /mnt/disk6/ag4-posgi 
The example below is more complex. It tells where to put the various index-spogi files (there are usually several index files, whose names start with index-[index-type] and also contain a number). The first argument is a regular expression. In the simpler example above, we used a caret (^) to indicate a match when the beginning of the name matches. In these lines, we specify more complicated matching, with the beginning of the string, and also specifying that the last digit of the number in the filename (just before the . that separates the name from the type, which is either cidx or midx) end (in the first case) with 2, 4, 6, 8, or 0 or (in the second case) with 1, 3, 5, 7, or 9. Files with matching names are put in the indicated directories.
FilePlacementRule ^index-spogi-.*[02468]\..idx$ /mnt/disk4/ag4-spogi-even  
FilePlacementRule ^index-spogi-.*[13579]\..idx$ /mnt/disk5/ag4-spogi-odd 
For information on regular expressions, see this Wikipedia entry (which has a link to examples) or the introduction to this Allegro CL document.
ExpectedStoreSize inheritable
An integer. If given, it is used by the server to guess suitable values for things like internal table sizes. You should only worry about this when trying to squeeze out more performance. Setting it too big can lead to some wasted resources, setting it too small to sub-optimal performance.
MaxRecoveryTime inheritable
A time (value like 10s, 5m, 1h). This parameter, if set, enables dynamic checkpointing for a database. By default, AllegroGraph will write a checkpoint in regular intervals, as configured by the CheckpointInterval parameter. If dynamic checkpointing is enabled, a checkpoint will be written whenever recovery from the transaction log file would exceed the time to which MaxRecoveryTime is set. This is useful on databases with little write activity. Note that the MinimumCheckpointInterval is still observed.
CheckpointInterval inheritable
A time (value like 10s, 5m, 1h) that is used to determine the amount of time between checkpoint writes for the store. A higher value increases the recovery time but might make checkpoints happen less frequently.
MinimumCheckpointInterval inheritable

A time (value like 10s, 5m, 1h) that is used to determine the minimum period that must elapse between two checkpoints. This parameter defaults to whatever CheckpointInterval has been set to or, if MaxRecoveryTime has been set, to 5 minutes.

Regardless of the value of MinimumCheckpointInterval and CheckpointInterval, a checkpoint will always occur after a new transaction log file is created (see TransactionLogSize setting).

TransactionLogSize inheritable
A size (for example 10m) that determines how big individual transaction log files are allowed to grow. When a transaction log size meets or exceeds this size, a new transaction log file will be created. The maximum is just under 4GB.
TlogSyncMethod
This parameter specifies the synchronized writing method for transaction logs. Three methods are supported: ODIRECT, SYNC, and fsync. The default (if this parameter is unspecified) is ODIRECT and that is the recommended choice on ext3 file systems. For catalogs residing on non-ext3 file systems, the other choices may yield performance benefits. (You will potentially see performance degradation in checkpointing. If that takes longer than expected and you are using a non-ext3 filesystem, try the other allowable values.)
DesiredTlogFiles

This parameter specifies the number of transaction log files which should be preallocated at database creation time. The default value is 2. Specifying a larger value helps lower the probability of additional transaction log files being created during commits.

Note: The circumstances under which the number of tlog files may grow larger than DesiredTlogFiles are if there is a long-running backup, transaction log archiving is running slowly, or if warm standby replication is running slowly or stalled. When possible, AllegroGraph will reduce the number of transaction log files back down to DesiredTlogFiles.

Example

A bigger example to demonstrate what some of the options can look like.

# Don't allow normal HTTP access, only SSL  
Port 10035  
AllowHTTP no  
SSLPort 10036  
SSLCertificate /var/lib/ag4/server.cert  
 
SettingsDirectory /var/lib/ag4/settings  
 
Backends 5  
# You can actually remove this after the first server run, to  
# reduce the risk of someone finding it here.  
SuperUser test:xyzzy  
 
ExpectedStoreSize 100000  
SessionPorts 8080-8083  
 
<RootCatalog>  
  Main /var/lib/ag4/root  
</RootCatalog>  
 
<Catalog fast>  
  ExpectedStoreSize 2000000  
  CheckpointInterval 1h       
  Main /var/lib/ag4/fast  
  StringTableDir /mnt/disk2/ag4-string-tables  
  TransactionLogDir /mnt/disk3/ag4-transaction-logs  
  FilePlacementRule ^index-spogi-.*[02468]\..idx$ /mnt/disk4/ag4-spogi-even  
  FilePlacementRule ^index-spogi-.*[13579]\..idx$ /mnt/disk5/ag4-spogi-odd  
</Catalog>  
 
<DynamicCatalogs>  
  Main /var/lib/ag4/dynamic  
</DynamicCatalogs> 

Changing database parameters

In some circumstances, it is desirable to modify the settings of an existing database by editing the 'parameters.dat' file in the database main directory. The syntax of this file is similar to that of the server configuration file, but only the parameters that are normally present inside of a catalog definition are allowed.

For example, the 'parameters.dat' file for a database 'demo' created with the 'fast' catalog definition above would look like this:

ExpectedStoreSize 2000000  
CheckpointInterval 1h       
Main /var/lib/ag4/fast  
StringTableDir /mnt/disk2/ag4/fast 

It might be edited to change the ExpectedStoreSize. It is also possible to add new file placement rules. When modifying any of the file placement related parameters of a database, care must be taken to make sure that all files that constitute the current database state are still visible to the database. For example, if the StringTableDir directory in the database above should be removed, all files in /mnt/disk2/ag4/fast/demo/ would need to be manually moved into the main directory of the database, /var/lib/ag4/fast/demo/.

When moving around database files, it is important to know that some of these files are sparse, i.e. they contain holes with unallocated blocks. Many file management utilities (like 'cp' and 'tar') can optionally preserve file sparseness, but care should be taken to make sure that copies of database files don't become unexpectedly large after a manual manipulation.

In a future release, AllegroGraph will include a utility to make manipulating database parameters and directories safer and easier.