The AllegroGraph server requires a configuration file in order to start up. Usually, this file is specified using the
--config command-line argument. A minimal file could look like this:
SettingsDirectory /tmp/ag4/settings SuperUser test:xyzzy <RootCatalog> Main /tmp/ag4/root </RootCatalog>
An AllegroGraph configuration file consists of a set of top-level directives, and one or more catalog definitions. The syntax is straight-forward: directives (both top-level and within a catalog definition) consist of an alphanumeric word, whitespace, and then the value of the directive. Indentation is ignored. A directive can span multiple lines by escaping newlines with a back-slash, in which case both the newline and the backslash will be treated as if they are not there. Catalog definitions are delimited by pseudo-XML markers like
<RootCatalog>. Lines starting with a
# are treated as comments.
The directories named in this file must either already exist and be writeable by the user running the server, or that user must be able to create them himself. You'll usually not want to use temporary paths as in the example, of course.
Parameter values must not be quoted. Spaces are not allowed in parameter values.
Any relative path in the config file is initially resolved with respect to the directory containing the config file. If you specify another directory using the BaseDir directive, all subsequent relative paths are resolved relative to that directory. BaseDir can be specified as often as you like and can itself be a relative pathname (which will be resolved with using the BaseDir value in use, or, if not previously specified, with respect to the directory containing the config file). If BaseDir is specified multiple times, relative pathnames in other directives are resolved with respect to the most recent BaseDir value.
- A directory pathname which will be used to resolve relative pathnames in subsequent directives in the config file. Can be specified multiple times in the config file, with new values replacing older ones. See just above for more details.
- Required setting. Specifies the directory in which the server stores persistent information such as user accounts.
- A boolean (
no) that can be used to turn off HTTP access to the server. Default is
- A boolean (
no) that can be used to turn on auditing. Default is
- Specifies the maximum number of processes spawned to handle HTTP requests (note that session processes do not count toward this limit). Default is 10.
- Determines the host on which the HTTP server listens. Can be left out to have the server listen on all interfaces. Set to
localhostto listen only locally.
- This parameter can be used to make HTTP requests made by the server (for example, when a SPARQL query loads data from an external URL) go through a proxy. Valid values are
hostname.net:8888, or, when the proxy requires authentication,
- The number of initial HTTP workers to be started by the AllegroGraph server. The default is 50. The number should be larger than the number of backends (see
Backendsabove) plus anticipated frontend sessions (used, for example, by Webview). Too few workers may cause long-running requests (like opening a triple store) to delay other concurrent requests.
- Specifies the directory where the server log file is written. (The log filename is agraph.log.)
- A 'memory release specification'. It can be used multiple times. Each description must be in the form
name:valuewhere name can be
timeand value must be a number of items to run between checks (for query and transaction) or a delay in seconds (for time).
- A size specifying the threshold at which a memory release will occur. The size can be specified in gigabytes (e.g. "3g"), megabytes (e.g. "3000m"), kilobytes (e.g. "3000000k") or bytes (e.g. "3000000000").
- A file to which the server writes out its process id.
- If supplied, must be an integer. Used to set the port on which the daemon runs its HTTP server. When not given, this defaults to
- Specifies the query processor used for SPARQL queries. This parameter can be overridden for specific queries (in an API specific manner). See the discussion in the SPARQL documentation for more details on the available engines and on how to choose amongst them.
- If given, must be an integer range (e.g. 13000-13020). When using replicas (see Replication and Warm Standby), the replication primary requires a separate listening port for each replica. The operating system will choose an available port for each replica (when the replica is set up) if no value is given for this option, and that will always work. However, if there is a firewall between the replication primary and (any of) the replicas, the firewall administrator may need to configure the firewall to allow incoming connections from the replicas to the primary. That configuration process can be aided by limiting the range of ports which can be used, and that is what this parameter does. If a range is specified, only those ports will be used by replicas. Any replica which might become a primary should have this parameter also specified in its configuration exactly as it is for the primary. Note that if no port in the range is available when setting up a replica, setting up the replica will fail. Therefore, the size of the range of ports should be at least the maximum expected number of replicas.
- Have the server, if started by root, run as the given user instead (defaults to
- If given, must be the server's host name or IP address for use in the URLs returned upon session creation. Useful when deploying a load balancer (like Amazon's Elastic Load Balancer) for sending the SessionHost string in the returned session URL instead of echoing the load-balancer's host name from the client request.
- If given, must be an integer range like
8000-8020. Defines the ports that will be used for sessions. Useful when these need to be opened in a firewall or similar. When not specified, random ports will be used.
- If given, the HTTP server will use this value as the base-url when parsing SPARQL queries. When not given, the URL of the request is used instead.
- When an
SSLPortis given, this must point to a file containing a server certificate and private key, PEM-encoded.
- An integer. If given, an SSL HTTP server will be run on this port.
- If given, must be a string in
name:passwordformat. The server will ensure, on startup, that a superuser with this name and password exists. Note that this means anyone that can read your configuration file has full access to the server. It is recommended to use the server setup script to create a superuser instead, or if you do use this directive, remove it after the first run of the server has created the user.
- Specifies the directory in which AllegroGraph may create temporary files. Defaults to the system's designated temp dir (typically
More on controlling memory usage
While processing a query, backend processes may allocate memory from the operating system. When a previously allocated memory area is no longer used, the processes normally do not return it to the operating system, in hopes of reusing it for subsequent queries. However, it may be advantageous to periodically return idle memory to the operating system. The MemoryCheckWhen and MemoryReleaseThreshold configuration parameters allow for this.
Note that while returning memory to the OS makes memory available to other processes, it also incurs the overhead of minor page faults on subsequent allocations in the same process.
Each shared backend and dedicated session tracks its own memory usage. When a check is made the resident set size (RSS) of the backend or session process is compared to MemoryReleaseThreshold. If the RSS is greater than MemoryReleaseThreshold then an effort is made to give back as much memory to the OS as possible.
Since this kind of check is fairly expensive, performing it too often can have a detrimental effect on overall performance. The MemoryCheckWhen directive specifies under what circumstances it should be done. Let's see a couple of examples.
Perform memory check after every 7 queries:
Perform memory check after every 2 transactions:
Perform memory check every 10 seconds:
Finally, a complete configuration that would check whether the memory was above the threshold every 10 seconds and after every 2 transactions:
MemoryReleaseThreshold 2g MemoryCheckWhen time:10 MemoryCheckWhen transaction:2
Note that MemoryReleaseThreshold must be specified whenever MemoryCheckWhen is. If neither of two are specified, then no checks are ever performed.
Catalogs are locations on disk where AllegroGraph keeps its triple-stores. These locations are specified in the configuration file, along with some optional default settings for stores in the catalogs. Most of the time, you will want to specify all catalogs directly in the configuration file, but it is also possible to enable dynamic catalogs, which can be created and deleted through the HTTP interface.
Catalog definitions in the server configuration files serve as templates for creating databases. The parameters defined in the catalog definition will be copied to the database when it is created. Changes to the catalog definition do not influence the settings of existing databases. In order to modify parameters of existing databases, the file 'parameters.dat' in the database 'Main' directory must be edited and the database be restarted.
There are three types of catalog definitions that can occur in an AllegroGraph configuration file: a root catalog, named catalogs, and a dynamic catalog specification. The first was seen in the example above (
<RootCatalog> ... </RootCatalog>), and is used to determine where stores live that do not have a catalog specified. Named catalogs look similar:
<Catalog temporary> Main /tmp/catalog </Catalog>
Their opening line specifies their name, which can contain any characters except slashes, backslashes, colons, and tildes. This name can then be used as catalog name when creating or accessing triple-stores.
<DynamicCatalogs> Main /tmp/dynamic </DynamicCatalogs>
The directory (as well as any other catalog directories, see below) given for dynamic catalogs will be extended with a catalog name when such a catalog is created. For example, given the above configuration, a dynamic catalog named
scratch would end up in
Some of the directives allowed within a catalog definition (those marked as inheritable) can also be specified at the top-level, where they act as a default value inherited by catalogs which don't explicitly specify that setting.
- Required for every catalog. Specifies the directory in which the triple-stores for the catalog are stored.
Specifies the directory in which transaction log subdirectories will be created for triple-stores in this catalog. The directory will be extended with the name of a triple-store. For example, if
/tmp/tlogs, then transaction logs for triple-store
examplewill be stored in
/tmp/tlogs/example. This parameter is optional and defaults to the value supplied for the
See the line in the example below
- which says transaction logs should be placed in the /mnt/disk3/ag4-transaction-logs/[triple-store-name]/ directory.
- Specifies the directory in which string table subdirectories will be created for triple-stores in this catalog. See
TransactionLogDirfor information on how directory names are constructed. This parameter is optional and defaults to the value supplied for the
Specifies additional rules that control file placement. Takes two arguments, a regular expression and a directory root where AllegroGraph puts files whose names match the regular expression. Both parameters must not be quoted. This is an optional parameter.
This entry, for example, says put files whose names begin with index-posgi in the directory /mnt/disk6/ag4-posgi/.
FilePlacementRule ^index-posgi /mnt/disk6/ag4-posgi
- The example below is more complex. It tells where to put the various index-spogi files (there are usually several index files, whose names start with index-[index-type] and also contain a number). The first argument is a regular expression. In the simpler example above, we used a caret (^) to indicate a match when the beginning of the name matches. In these lines, we specify more complicated matching, with the beginning of the string, and also specifying that the last digit of the number in the filename (just before the . that separates the name from the type, which is either cidx or midx) end (in the first case) with 2, 4, 6, 8, or 0 or (in the second case) with 1, 3, 5, 7, or 9. Files with matching names are put in the indicated directories.
FilePlacementRule ^index-spogi-.*\..idx$ /mnt/disk4/ag4-spogi-even FilePlacementRule ^index-spogi-.*\..idx$ /mnt/disk5/ag4-spogi-odd
- For information on regular expressions, see this Wikipedia entry (which has a link to examples) or the introduction to this Allegro CL document.
- An integer. If given, it is used by the server to guess suitable values for things like internal table sizes. You should only worry about this when trying to squeeze out more performance. Setting it too big can lead to some wasted resources, setting it too small to sub-optimal performance.
- A time (value like
1h). This parameter, if set, enables dynamic checkpointing for a database. By default, AllegroGraph will write a checkpoint in regular intervals, as configured by the
CheckpointIntervalparameter. If dynamic checkpointing is enabled, a checkpoint will be written whenever recovery from the transaction log file would exceed the time to which
MaxRecoveryTimeis set. This is useful on databases with little write activity. Note that the
MinimumCheckpointIntervalis still observed.
- A time (value like
1h) that is used to determine the amount of time between checkpoint writes for the store. A higher value increases the recovery time but might make checkpoints happen less frequently.
A time (value like
1h) that is used to determine the minimum period that must elapse between two checkpoints. This parameter defaults to whatever
CheckpointIntervalhas been set to or, if
MaxRecoveryTimehas been set, to 5 minutes.
Regardless of the value of MinimumCheckpointInterval and CheckpointInterval, a checkpoint will always occur after a new transaction log file is created (see TransactionLogSize setting).
- A size (for example
10m) that determines how big individual transaction log files are allowed to grow. When a transaction log size meets or exceeds this size, a new transaction log file will be created. The maximum is just under 4GB.
- This parameter specifies the synchronized writing method for transaction logs. Three methods are supported: ODIRECT, SYNC, and fsync. The default (if this parameter is unspecified) is ODIRECT and that is the recommended choice on ext3 file systems. For catalogs residing on non-ext3 file systems, the other choices may yield performance benefits. (You will potentially see performance degradation in checkpointing. If that takes longer than expected and you are using a non-ext3 filesystem, try the other allowable values.)
This parameter specifies the number of transaction log files which should be preallocated at database creation time. The default value is 2. Specifying a larger value helps lower the probability of additional transaction log files being created during commits.
Note: The circumstances under which the number of tlog files may grow larger than
DesiredTlogFilesare if there is a long-running backup, transaction log archiving is running slowly, or if warm standby replication is running slowly or stalled. When possible, AllegroGraph will reduce the number of transaction log files back down to DesiredTlogFiles.
A bigger example to demonstrate what some of the options can look like.
# Don't allow normal HTTP access, only SSL Port 10035 AllowHTTP no SSLPort 10036 SSLCertificate /var/lib/ag4/server.cert SettingsDirectory /var/lib/ag4/settings Backends 5 # You can actually remove this after the first server run, to # reduce the risk of someone finding it here. SuperUser test:xyzzy ExpectedStoreSize 100000 SessionPorts 8080-8083 <RootCatalog> Main /var/lib/ag4/root </RootCatalog> <Catalog fast> ExpectedStoreSize 2000000 CheckpointInterval 1h Main /var/lib/ag4/fast StringTableDir /mnt/disk2/ag4-string-tables TransactionLogDir /mnt/disk3/ag4-transaction-logs FilePlacementRule ^index-spogi-.*\..idx$ /mnt/disk4/ag4-spogi-even FilePlacementRule ^index-spogi-.*\..idx$ /mnt/disk5/ag4-spogi-odd </Catalog> <DynamicCatalogs> Main /var/lib/ag4/dynamic </DynamicCatalogs>
Changing database parameters
In some circumstances, it is desirable to modify the settings of an existing database by editing the 'parameters.dat' file in the database main directory. The syntax of this file is similar to that of the server configuration file, but only the parameters that are normally present inside of a catalog definition are allowed.
For example, the 'parameters.dat' file for a database 'demo' created with the 'fast' catalog definition above would look like this:
ExpectedStoreSize 2000000 CheckpointInterval 1h Main /var/lib/ag4/fast StringTableDir /mnt/disk2/ag4/fast
It might be edited to change the ExpectedStoreSize. It is also possible to add new file placement rules. When modifying any of the file placement related parameters of a database, care must be taken to make sure that all files that constitute the current database state are still visible to the database. For example, if the StringTableDir directory in the database above should be removed, all files in /mnt/disk2/ag4/fast/demo/ would need to be manually moved into the main directory of the database, /var/lib/ag4/fast/demo/.
When moving around database files, it is important to know that some of these files are sparse, i.e. they contain holes with unallocated blocks. Many file management utilities (like 'cp' and 'tar') can optionally preserve file sparseness, but care should be taken to make sure that copies of database files don't become unexpectedly large after a manual manipulation.
In a future release, AllegroGraph will include a utility to make manipulating database parameters and directories safer and easier.