Introduction

The AllegroGraph server is the running program which manages the various AllegroGraph repositories (also occasionally referred to as triple-stores or databases). The server must be set up as described in Server Installation.

This document describes server configuration below and server control further below.

All directives for AllegroGraph 8.3.1 are listed in this document. Most but not all can be used in earlier versions. The version in which the directive was added is shown for each case (versions prior to 7.0.0 are not specified exactly). If you are using a version earlier than 8.3.1 (but using this version of the documentation) the directives added after the version you are using cannot be used.

Argument notation

The directives that go in the configuration file generally require an argument. We generally show the type of argument expected. Here are some common argument types:

Some directives (usually those identifying things) can be specified multiple times meaning any of the specified values can be used. The directive description will say whether it can be specified more than once. A few directives can have multiple values on one line, usually comma separated.

Directive values must not be quoted. Spaces are not allowed in directive values.

Server configuration

The AllegroGraph server requires a configuration file in order to start up. Usually, this file is specified using the --config command-line argument. A minimal file could look like this:

SettingsDirectory /tmp/ag4/settings  
SuperUser test:xyzzy  
 
<RootCatalog>  
  Main /tmp/ag4/root  
</RootCatalog> 

An AllegroGraph configuration file consists of a set of top-level directives, and one or more catalog definitions. The syntax is straight-forward: directives (both top-level and within a catalog definition) consist of an alphanumeric word, whitespace, and then the value of the directive. Indentation is ignored and trailing whitespace is removed. A directive can span multiple lines. Catalog definitions are delimited by pseudo-XML markers like <RootCatalog>.

Each line of the configuration file is processed this way:

  1. Remove comment if present. A comment begins with # anywhere on the line and extends to the end of the line. However the hash character can be escaped using a backslash as in \# in which case the \# is replaced by # and the hash character is considered part of the directive on the line.
  2. Remove trailing whitespace (spaces and tabs).
  3. If the line now ends in a backslash character then process the next line using these steps and append it to the this line with the end of line backslash removed.

Some examples of using # and \:

  # full line comment  
 
  24000-30000 # port range  
 
  one two \ # line will continue on next line  
  three four  # comment on second line  
 
 
  this \# is a hash character # shows hash character 

which results in these three directives being processed by the configuration parser:

  24000-30000  
  one two three four  
  this # is a hash character 

Each directive defining a parameter must be on its own single line (or multiple lines joined by backslashes as described above). Specifying more than one parameter on a single line will result in an error, or, possibly, in the first parameter being defined incorrectly and the remaining ones on the line not being defined at all.

The directories named in this file must either already exist and be writeable by the user running the server, or that user must be able to create them himself. You'll usually not want to use temporary paths as in the example, of course.

Note about SuperUser and AuthPolicy

For users of an external password utility like LDAP, the agraph.cfg directive

SuperUser test:xyzzy 

will create the test account and give it the internal password of xyzzy however that password can never be used if you specify AuthPolicy as external-token or token-external or external.

BaseDir directive

Any relative path in the config file is initially resolved with respect to the directory containing the config file. If you specify another directory using the BaseDir directive, all subsequent relative paths are resolved relative to that directory. BaseDir can be specified as often as you like and can itself be a relative pathname (which will be resolved with using the BaseDir value in use, or, if not previously specified, with respect to the directory containing the config file). If BaseDir is specified multiple times, relative pathnames in other directives are resolved with respect to the most recent BaseDir value.

BaseDir PATHNAME
A directory pathname which will be used to resolve relative pathnames in subsequent directives in the config file. Can be specified multiple times in the config file, with new values replacing older ones. See just above for more details. (Added in version 7.0.0 or before.)

Top-level directives

SettingsDirectory PATHNAME
Required setting. Specifies the directory in which the server stores persistent information such as user accounts. (Added in version 7.0.0 or before.)
AccessLogEnabled BOOLEAN
A boolean (yes/no) that can be used to enable logging of successful HTTP(S) requests to a dedicated log file. (Added in version 7.0.0 or before.)
AccessLogDir PATHNAME
Directory in which the HTTP access log files are written. Default is LogDir. (Added in version 7.0.0 or before.)
AccessLogFilePattern VALUE
A file name pattern with strftime style directives, to set up log rotation for the HTTP access log. The pattern may contain spaces. Default is access-%Y%m%d.log containing year, month and date. (Added in version 7.0.0 or before.)
AccessLogEntryFormat VALUE
A log format pattern using Apache style directives. (Added in version 7.0.0 or before.) See Apache documentation for a list of possible directives, most of which are supported. The pattern may contain spaces. In case of invalid or unsupported directives a warning will be logged to agraph.log. The logged value for unsupported directives (e.g. %l) is a dash: -. The default format is the NCSA extended/combined log format:
%h - %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" 

AllowHTTP BOOLEAN
A boolean (yes/no) that can be used to turn off HTTP access to the server. Default is yes. (Added in version 7.0.0 or before.) Clients usually interact with the AllegroGraph server using either HTTP or HTTPS or both. This is true of Webview as well as Java and Python clients. The agtool program will use the direct lisp interface if possible otherwise it will use HTTP or HTTPS. For security reasons you may wish to disable HTTP access and only permit HTTPS access. In this case you specify 'no' for this switch and also specify an SSLPort (see the section Top-level directives for SSL client certificate authentication). However in order for the direct lisp interface clients (such as agtool) to connect to the server a very limited HTTP webserver is always running inside the server. This webserver serves only two URLs and returns public information available via ps and netstat. Thus it's not a security concern that HTTP is used.

AllowIP IP-ADDRESSES

(Added in version 7.0.0 or before.) The value of the option must be a string containing one or more comma-separated IP address blocks in CIDR notation. The symbol any can be used instead of 0.0.0.0/0 notation.

Patterns can be either accepting or denying (prefixed with !) and are matched in the specified order, so less restrictive patterns must be included later in the list than denials which they include (see the second example, where 127.13.0.0 will be denied even though the list ends with any). Examples:

# Accept only loopback connections:  
AllowIP 127.0.0.0/8, !any  
 
# Accept any IP addresses except 127.13.0.0/16 range,  
# unless it's 127.13.13.13:  
AllowIP 127.13.13.13, !127.13.0.0/16, any 
If this option is non-empty, for every incoming connection, AllegroGraph extracts the IP address and attempts to match it against the given patterns in the specified order until the first successful match. If the matched pattern is accepting, the server handles the request, otherwise HTTP status code 403 is returned.
Auditing BOOLEAN
A boolean (yes/no) that can be used to turn on auditing. Default is no. (Added in version 7.0.0 or before.)
Backends INTEGER
Specifies the maximum number of processes spawned to handle HTTP requests (note that session processes do not count toward this limit). Default is 10. (Added in version 7.0.0 or before.)

BackendMaxIdle INTEGER
Specifies the maximum number of seconds a backend can be idle before it is eligible for being killed to recover any resources it has aquired. The default is 600 seconds (10 minutes). (Added in version 7.4.0.)

BriefBacktrace TRUE-OR-FALSE
AllegroGraph errors cause the system to send a backtrace (a reverse ordered list of the function calls that resulted in the error, along with their arguments) to the log file. If this directive is true (default is false), only the function names will be included in the backtrace and argument information will be suppressed. This will make debugging the error more difficult but sensitive data (such as passwords) which might have been passed as arguments will then not appear in the log. (Added in version 7.0.0 or before.)

StaleDNSEntryRetainTime SECONDS
After a DNS entry has exceeded its specified lifetime AllegroGraph still may return it immediately to a query and at the same time it will try to update the value. This parameter specifies how long AllegroGraph will retain a stale value. If the DNS mapping from name to IP address may change while the application is running you'll want to specify a very small value (number of seconds). A zero value will cause the DNS record to be considered correct for only the lifetime specified in the DNS record itself. The default is 1800 (= 60x30 seconds or 30 minutes). (Added in version 7.0.0 or before.)

DisableGruff BOOLEAN
If true (the default is false), Gruff cannot be invoked from AGWebView. See also the MaxGruffProcesses directive. (Added in version 7.0.0 or before.)
EvalAllowed BOOLEAN
A switch to turn off Evaluate arbitrary code permissions globally. If it is yes (the default), then the Evaluate arbitrary code permission bits are in effect. If it is no, then arbitrary code evaluation is disabled for all users (including superuser) regardless of the value of a user's actual permissions. (Added in version 7.0.0 or before.)
HostName HOST
Determines the host on which the HTTP server listens. Can be left out to have the server listen on all interfaces. Set to localhost to listen only locally. (Added in version 7.0.0 or before.)
HTTPProxy VALUE
This parameter can be used to make HTTP requests made by the server (for example, when a SPARQL query loads data from an external URL) go through a proxy. The VALUE syntax is [USER:PASSWORD@]HOST[:PORT], where the square brackets indicate optional portions. PORT defaults to 80. A USER and PASSWORD are necessary when the proxy requires authentication. (Added in version 7.0.0 or before.)
HTTPNoProxy VALUE
(Added in version 7.0.0 or before.) When proxying is enabled with the HTTPProxy directive, this parameter can be used to list exceptions. HTTP requests made by the server to domains that match one of the suffixes specified with HTTPNoProxy are never proxied. HTTPNoProxy can be specified multiple times, for example:
HTTPNoProxy mydomain.com  
HTTPNoProxy otherdomain.com 

With the above configuration, requests made by the server for mydomain.com, otherdomain.com, sub.mydomain.com or notmydomain.com will not be proxied.

IP addresses can be specified and they are subject to the same string suffix matching rule as domain names are. Crucially, that means that a request made for a particular domain name will not match a rule that specified an IP address even if the domain name resolves to the IP address. It is usually the best to use entire IP addresses:

HTTPNoProxy 192.168.0.1 
Requests to localhost and 127.0.0.1 are never proxied.

HttpTrace FILE-PATHNAME
If the HttpTrace directive is supplied, it is the name of the log file to which HTTP traffic is to be dumped. Relative pathnames are with respect to the log directory specified by the LogDir directive. If this HttpTrace directive is specified and HttpTraceOptions (see just below) is not specified, tracing is done using the default HttpTraceOptions. See also the --http-trace-options command line argument. (Added in version 7.0.0 or before.)

HttpTraceOptions OPTIONS
If the HttpTrace directive is specified, tracing will be done with the information requested by this directive being written to the file specified by HttpTrace. The value of this directive should be a comma separated list of options. Options starting with the character + turn on the corresponding log category. Those starting with - turn them off (allowing you to enable a general category, like +all and disable specific items thus enabled, like -proxy). max-message-size=NUMBER causes truncation of entries after NUMBER characters. The default value of this directive is +xmit,max-message-size=1000. The available log categories are listed in the Debugging section
of the AllegroServe documentation
HTTPWorkers INTEGER
The number of initial HTTP workers to be started by the AllegroGraph server. The default is 50. The number should be larger than the number of backends (see Backends above) plus anticipated frontend sessions (used, for example, by Webview). Too few workers may cause long-running requests (like opening a repository) to delay other concurrent requests. (Added in version 7.0.0 or before.)
HTTPKeepAliveTimeout INTEGER
The number of seconds for the HTTP keep alive timeout. The default is 10. (Added in version 7.0.0 or before.)

LicenseWarnInAdvance NON-NEGATIVE-INTEGER
If NON-NEGATIVE-INTEGER is greater than 0, the server will issue a daily warning when the license will expire within that number of days. The warnings will continue until either the license is updated or the value of this directive is set to 0 (a server restart is required in either case). The default is 30. The value 0 means suppress the warning. (Added in version 7.0.0 or before.)

LogDir PATHNAME
Specifies the directory where the server log files are written. The primary log file is agraph.log, the secondary file is agraph-fallback.log. The secondary is only used when writing to the primary fails. The secondary file is preallocated to a size of 1MiB, so some log messages can be written even if the filesystem is full. On server startup or if writing to the primary becomes possible again, the contents of the secondary log file are automatically appended to the primary and the secondary log file is reinitialized. (Added in version 7.0.0 or before.)

MaxGruffProcesses POSITIVE-INTEGER
Restricts the number of GRUFF processes to POSITIVE-INTEGER (default 8). This prevent users from opening too many GRUFF sessions. See also the DisableGruff directive. (Added in version 7.0.0 or before.)
MemoryCheckWhen VALUE
A 'memory release specification'. It can be used multiple times. Each description must be in the form name:value where name can be query, transaction or time and value must be a number of items to run between checks (for query and transaction) or a delay in seconds (for time). (Added in version 7.0.0 or before.)
MemoryReleaseThreshold INTEGER
A size specifying the threshold at which a memory release will occur. The size can be specified in gigabytes (e.g. 3g), megabytes (e.g. 3000m), kilobytes (e.g. 3000000k) or bytes (e.g. 3000000000). (Added in version 7.0.0 or before.)

PidFile FILE-PATHNAME
A file to which the server writes out its process id. (Added in version 7.0.0 or before.)

Port INTEGER
If supplied, must be an integer. Used to set the port on which the daemon runs its HTTP server. When not given, Port defaults to 10035. If you do not want to allow access with HTTP, you must specify a value for SSLPort (see the section Top-level directives for SSL client certificate authentication) and also specify AllowHTTP to no. The value of Port can be overridden by the --port argument to agraph-control (and the --port argument to the agraph program). (Added in version 7.0.0 or before.)

QueryResultsLimit NUMBER
A user can be restricted to a maximum number of results to a query independently of a LIMIT clause in the query or other restriction. See Limiting results of a query in SPARQL Reference and Limiting the results a user can see in Managing Users. This directive specifies the limit, which applies to all user/repo limitations. The default value is 1000. (Added in version 7.2.0.)

QueryResultsCacheSize NUMBER
If not 0, this specifies the maximum number of entries in the query results cache. If 0, query results caching is prohibited. See allowCachingResults query option documentation for more details. The default value is 1000. (Added in version 8.0.0.)

QueryResultsCacheStorageSize INTEGER
If non 0, this specifies the maximum on-disk size of query results cache. If 0, query results caching is prohibited. The value can be specified in gigabytes (e.g. 3g), megabytes (e.g. 3000m), kilobytes (e.g. 3000000k) or bytes (e.g. 3000000000). See allowCachingResults query option documentation for more details. The default value is 1G. (Added in version 8.0.0.)

QueryOption NAME=VALUE
(Added in version 7.0.0 or before.) A query option specification. It can be used multiple times. Each specification is equivalent to a query prefix option. The global configuration directive
QueryOption NAME=VALUE 
is the same as the query prefix
PREFIX franzOption_NAME: <franz:VALUE> 
but will be applied to all queries. See here in the SPARQL Reference for a list of SPARQL query options. An example is
QueryOption logQuery=yes 
equivalent to this prefix added to each query
PREFIX franzOption_logQuery: <franz:yes> 
Note there are a lot of query options and it would be redundant to list them all here when you can find them in the SPARQL Reference. But here are some more that are commonly used (keys and passwords shown are not valid; go to the SPARQL Reference for a complete description in all cases):
QueryOption openaiApiKey: sk-XXXXXXXXXXXIJKlmnOpQ3RstvVWxyZABcD4eFG5jiJKlmno  
QueryOption serpApiKey: XXXXXXXXXX4b15627d34b5859c76d17038d791c26e38f161e1234567e9>  
QueryOption profileQuery=time  
QueryOption profileQuery=space  
QueryOption chunkProcessingAllowed=yes  

ReplicationPorts INTEGER-RANGE
(Added in version 7.0.0 or before.) If given, must be an integer range (e.g. 13000-13020). Replicas (single master -- see Replication and Warm Standby or multi-master -- see Multi-master Replication) require additional ports that are different from the Allegrograph HTTP/HTTPS ports. When using single-master replicas, the replication primary requires a separate listening port for each replica. When using multi-master replication, instances listen for TCP connections from other instances on these additional ports. In both cases, the operating system will choose an available port as needed when the replicas are set up if no value is given for this option, and that will always work. However, if there is a firewall between the replication primary and (any of) the replicas (single-master) or between instances (multi-master), the firewall administrator may need to configure the firewall to allow incoming connections from the replicas to the primary or among the replicas. That configuration process can be aided by limiting the range of ports which can be used, and that is what this parameter does. If a range is specified, only those ports will be used by replicas. For single-master, any replica which might become a primary should have this parameter also specified in its configuration exactly as it is for the primary. For multi-master, all servers should have this parameter identically specified in their configurations. Note that if no port in the range is available when setting up a replica, setting up the replica will fail. Therefore, the size of the range of ports should be at least the maximum expected number of replicas.

RunAs USERNAME
Have the server, if started by root, run as the given user instead (defaults to agraph). (Added in version 7.0.0 or before.)

SlowQueryLogThreshold NUMBER
(Added in version 7.0.0 or before.) If a SPARQL query takes longer than NUMBER milliseconds, log information about the query to the log file (or to the file named by SlowQueryLogFile if specified). An example when SlowQueryLogThreshold is 1:
Slow query (1.261700 msec): select ?s { ?s ?p ?o . } limit 100000 
The SPARQL query option slowQueryLogThreshold will set the threshold for a specific query. See Query Options in SPARQL Reference.

SlowQueryLogFile PATHNAME
Slow query log entries (see SlowQueryLogThreshold just above) will be written to PATHNAME rather than the regular log file if this option is specified. Relative pathnames are with respect to the log directory specified by the LogDir directive. (Added in version 7.0.0 or before.)

SMTPHost ID SMTP-CONFIGURATION

(Added in version 7.0.0 or before.) Defines a named SMTP configuration for use by AllegroGraph features that support email notification, such as auditing and the event scheduler. Multiple SMTPHost definitions are allowed.

ID is a name that is used with other configuration options to specify the SMTP host being defined. SMTP-CONFIGURATION associates with ID a server, a login name and other information. For example, the following defines the SMTPHost with ID gmail:

SMTPHost gmail \  
  server="smtp.gmail.com", ssl=true, starttls=false,\  
  from="[email protected]", login="[email protected]", \  
  password="somepassword" 
The following options are supported by SMTPHost:
  • server (string): the hostname or IP address of the server (example: "smtp.gmail.com", or "127.0.0.1"). This is a required parameter.

  • port (integer): defaults 25 for non-SSL, 465 for SSL (example: port=993)

  • ssl (boolean): defaults to false (example: ssl=true)

  • starttls (boolean): defaults to false (example: starttls=true)

  • from (string): the email address to which the From: header of emails sent via this SMTPHost will be set. This is a required parameter.

  • login (string): the user on the remote server (example: login="[email protected]")

  • password (string): the password corresponding to login.

  • password-command (string): a string suitable to be executed as a shell command. The specified command should output a single line containing the password to stdout. This is intended to avoid storing plaintext passwords in the configuration file.

SPARQLBaseURL URL
If given, the HTTP server will use this value as the base-url when parsing SPARQL queries. When not given, the URL of the request is used instead. (Added in version 7.0.0 or before.)

SuperUser NAME:PASSWORD
If given, must be a string in name:password format. The server will ensure, on startup, that a superuser with this name and password exists. Note that this means anyone that can read your configuration file has full access to the server. It is recommended to use the server setup script to create a superuser instead, or if you do use this directive, remove it after the first run of the server has created the user. (Added in version 7.0.0 or before.)
TempDir PATHNAME
Specifies the directory in which AllegroGraph may create temporary files. Defaults to the system's designated temporary directory (typically /tmp). (Added in version 7.0.0 or before.)

TransactionSemantics VALUE
Either sesame-2.6 (the default) or sesame-2.7. It controls whether a new transaction is started automatically or an explicit begin is necessary. See Transaction handling semantics for more information. (Added in version 7.0.0 or before.)

UseLicensedCores BOOLEAN
(Added in version 7.0.2.) If your AllegroGraph license restricts the number of cores that may be used by AllegroGraph, specifying this directive true (default is false) ensures that no more than that number of cores will be used. If on startup you get an error message saying like the following:

Starting server failed: The machine has more cores (4) than the number of licensed cores (1). Specify "UseLicensedCores true" in agraph.cfg in order to have AllegroGraph use only the licensed number of cores. See these links for details: https://franz.com/agraph/support/documentation/server-installation.html#licensekey https://franz.com/agraph/support/documentation/daemon-config.html#uselicensedcores

Setting this directive to true in the agraph.cfg file will ensure no more than the licensed number of cores will be used.

XMLVersion 1.0-or-1.1
Specifies the XML version to use for RDF serialization. Default is 1.1. Specify XMLVersion 1.0 if you wish AllegroGraph to use XML 1.0. See also the Lisp function serialize-rdf/xml and the Lisp variable *serializer-xml-version*. (Added in version 7.2.0.)

Session directives

A session is a user-specific connection to the AllegroGraph server. Because it is controlled by a single user, it can be transactional (changes are not permanently added to the database until committed and rollbacks are supported) and it is suitable for loading user-specific scripts. Sessions can also access several stores in a federation.

Sessions can only be created by users who have permission to start sessions (see Managing Users for information on user permissions).

Those users can start sessions from the New WebView Repository Menu/AGWebView or from the page displayed by Utilities menu | Sessions (in both new and traditional WebView), using the HTTP/REST interface, in Python), and in Java.

The following configuration directives affect sessions:

SessionHost VALUE
If given, must be the server's host name or IP address for use in the URLs returned upon session creation. Useful when deploying a load balancer (like Amazon's Elastic Load Balancer) for sending the SessionHost string in the returned session URL instead of echoing the load balancer's host name from the client request. (Added in version 7.0.0 or before.)

SessionPorts INTEGER-RANGE
If given, must be an integer range like 8000-8020. Defines the ports that will be used for sessions. Useful when these need to be opened in a firewall or similar. When not specified, random ports will be used. (Added in version 7.0.0 or before.)

UseMainPortForSessions BOOLEAN
A boolean (yes/no). If yes then the AllegroGraph process listening on the main port will act as a proxy for all requests for session processes. This helps to avoid firewall problems when using sessions, as only the main port needs to be exposed in this case. (Added in version 7.1.0.)

The next two directives control how long a session can be idle before it is terminated by the system. When starting a session with traditional AGWebView (described here), a timeout cannot be specified, so the value of DefaultSessionTimeout is the idle timeout for any AGWebView session. Starting in New WebView (see New WebView Repository Menu) does allow specification of timeouts when the session is created on the Utilities | Sessions page. Idle timeouts can be specified for sessions started with the HTTP/REST interface, Python, or Java (in all cases setting the lifetime parameter/argument). The value specified must be less or equal to than the MaximumSessionTimeout. Neither DefaultSessionTimeout nor MaximumSessionTimeout can be determined programmatically so users should ask the database administrator for those values if needed.

DefaultSessionTimeout INTEGER
Sets the idle timeout (aka lifetime) to use for sessions which are created without specifying one. The default value is 300 seconds (5 minutes). This value is used by all sessions created in traditional AGWebView (described here) because AGWebView does not permit specifying a different value. (New WebView does allow specifying a timeout when the session is created on the Utilities | Sessions page.) (Added in version 7.0.0 or before.)
MaximumSessionTimeout INTEGER
Sets the maximum idle timeout (aka lifetime) that may be specified when creating a session. Any attempt to create a session with a timeout larger than this value will fail. The default value is 21600 seconds (6 hours). (Added in version 7.0.0 or before.)

Top-level directives for SSL client certificate authentication

In addition to authenticating remote users via HTTP Basic authentication, users can also be authenticated using SSL certificates. See the comments about authenticating users in the introduction of the Security Implementation document.

The following directives are used to enable SSL client authentication. See the SSL/TLS Quick Start document for details on specifying an SSL/TLS connection to the AllegroGraph server.

SSLCertificate FILE-PATHNAME
This must be the path of a file containing a server certificate and private key, PEM-encoded. This parameter is required when SSLPort is set. (Added in version 7.0.0 or before.)

SSLPort INTEGER
An integer specifying a port number. If given, an SSL HTTP server will be run on this port. The value of SSLPort can be overridden by the --sslport argument to agraph-control (and the --sslport argument to the agraph program). (Added in version 7.0.0 or before.)
SSLCAFile FILE-PATHNAME
This must point to a file containing one or more PEM-encoded certificates of trusted certificate authorities (CAs). A client certificate will be trusted if it has been signed by a CA within this file. This setting is required to enable certificate-based client authentication. (Added in version 7.0.0 or before.)
SSLCRLFile FILE-PATHNAME
If supplied, this must point to a file containing a PEM-encoded certificate recovation list (CRL). If a client certificate is received which has a serial number matching one in the CRL, the certificate (and therefore the entire SSL connection) will be rejected. This setting is optional. (Added in version 7.0.0 or before.)
SSLClientAuthUsernameField VALUE
(Added in version 7.0.0 or before.) The "Subject" field of a client certificate supplies the identity of the client. The subject is typically composed of several parts, for example:
Subject: C=US, ST=California, L=Oakland, O=Franz Inc., OU=Developers,  
     CN=Joe Smith, [email protected] 
This setting specifies which part of the Subject field of the client certificate should be used to identify the user to AllegroGraph. The setting may be CN (the default) or emailAddress. The value of the specified part will be used to perform a lookup in the user database (e.g., Joe Smith or [email protected] depending on the SSLClientAuthUsernameField setting).
SSLClientAuthRequired BOOLEAN
This setting determines if client certificate validation is required or optional. If yes, all SSL requests must contain a valid client certificate. If no (the default), then SSL requests without a client certificate are allowed. In this case, AllegroGraph falls back to HTTP Basic authentication. (Added in version 7.0.0 or before.)
SSLProtocol VALUE
(Added in version 7.0.0 or before.) This setting can have the following values:
The values are case-insenstive. When setting more than one, spaces or commas can be used as separators. The default is tlsv1+.
SSLCipherSuite STRING

(Added in version 7.0.0 or before.) A string as described in https://www.openssl.org/docs/man1.0.2/apps/ciphers.html. This string specifies the list of ciphers that the AllegroGraph server is willing to use for incoming SSL connections.

The default value, taken from this guide: https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/, is

"ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS" 
RedirectHTTPOnHTTPS BOOLEAN
If yes (default is no) HTTP requests sent to the AllegroGraph SSLPort will result in a 301 Moved Permanently response with the location set to the HTTPS version of the URL. If SSLPort is not set, this directive has no effect. (Added in version 8.2.0.)

Top-level directives for LLM (Large Language Models)

These directives are used to specify site specific information about available LLMs. LLMs are described in the Large Language Models and Vector Databases document. Much LLM information is already hardwired in AllegroGraph and need not be configured in the agraph.cfg file but Ollama details (see the Ollama support document) cannnot be hardwired in as the location of the server will be different for each AllegroGraph installation.

AllegroGraph in version 8.3.0 (and later) is able to work with OpenAI and Ollama LLMs. More will be added in the future and if need be the configuration directives available in agraph.cfg will change.

The LLM configuration option is specified using a different format than most configuration options: it is an XML specification similar to catalog definitions rather than the CONFIG-DIRECTIVE-NAME value format used for most directives. Our example, which you can adapt to your specific needs, shows what information is needed.

This directive was added in release 8.3.0. At this time the only LLM configuration directive you'll need to add is for Ollama. Here is an example of such a directive:

<llm ollama>  
 chatModel llama3.1  
 chatModel mistral  
 chatModel qwen2  
 embedderModel llama3.1  
 embedderModel mistral  
 embedderModel qwen2  
 scheme http  
 host  demo1  
 port   11434  
 embedder ollama  
</llm> 

The Ollama server (running on the specified host, demo1 in our example) will only support the models installed so those must be listed. There are two types of models with potentially different names (but with the same names in our example). A chatModel is used when you send a prompt string to the LLM and it responds with a response string. See A quick example in the Ollama document for an example. There are links there to descriptions on scheme, host, and port.

An embedderModel is used to compute an embedding vector from a text string. Vector triple stores use embedder models as described in the Vector Storage Features in AllegroGraph document. When you fill out the WebView dialog for specifying a vector db (see vector-dbs-and-rag.html#Creating-a-vector-database), ollama is the Embedder and one of the specified embedderModels is the Model.

Top-level directives for external authentication

External authentication uses an external service to verify user credentials. Note that all user data (permissions, etc) is stored locally, so in order to be able to authenticate a user externally, a user with the same name must already exist on a given AG server.

The only external authentication protocols currently supported are LDAP and OAuth/OIDC.

AuthPolicy AUTH-POLICY
(Added in version 7.0.0 or before.) This is used to either enable or disable external authentication and/or specify the order of authentication attempts. Note that the policy only applies to basic (username and password) authorization and does not affect certificate or token authorization. The value must be one of the following:
  • internal (default) - use only internal authentication mechanisms (AG password, token etc);
  • external, external-token and token-external - use only external authentication mechanisms (LDAP or OIDC) and tokens; a password set for a user in the internal AllegroGraph database cannot be used for authentication; external and external-token are identical and mean that external authentication is tried first, and token afterwards; token-external reverses the order; note that there is no way to disable tokens as they are used for WebView and Gruff authentication to avoid having to pass around external, usually long-lived credentials;
  • internal-external - for a given username and password, attempt internal authentication first, and if it fails, attempt external authentication;
  • external-internal - for a given username and password, attempt external authentication first and if it fails, attempt internal authentication.

LDAPAuth LDAP-CONFIGURATION

(Added in version 7.0.0 or before.) Defines an LDAP configuration which will be used to connect to an external LDAP server to perform user authentication. Note that LDAP authentication can only be used if AuthPolicy option contains the word external (such as external, external-internal, token-external etc.)

The LDAP configuration requires a host and port (optionally, with SSL support) and a username template to construct the username that will be used in bind operation.

Here is an example LDAP configuration:

LDAPAuth \  
  server="ldap://ldap.example.com:389", \  
  username-template="cn={},dc=example,dc=com" 
The following options are supported by LDAPAuth:
  • server (string): the URI of the LDAP server. The URI scheme must be either ldap for regular connection or ldaps for SSL connection. Port is optional and defaults to 389 if the scheme is ldap or 636 if the scheme is ldaps. Examples: ldap://ldap.example.com, ldaps://ad.example.com:10636. This is a required parameter.

  • username-template (string): the template for constructing the LDAP username from the AG username. Must contain a username marker "{}" which will be replaced with AG username during an authentication attempt. An example LDAPAuth value for an Active Directory service would be:

    LDAPAuth \  
       server="ldap://ad.example.com", \  
       username-template="{}@ad.example.com 
  • connection-timeout (integer): timeout in seconds for LDAP connection. Defaults to 10.

  • cache-timeout (integer): caching period in seconds of the LDAP user's password hash to avoid constant authentication requests to the LDAP server. Defaults to 1800 (30 minutes). Only successful authentication attempts are cached.

LDAP authorization works by performing a bind operation using the password provided in the request and a username that is constructed from the username template and the username provided in the request by replacing the username marker with the actual username. If this bind succeeds, AG considers the user authenticated, otherwise unauthenticated. If any connection-related errors happen when attempting the bind, they will be propagated to the user.

OAuth OAUTH-CONFIGURATION

This directive was added in release 8.3.0. Defines an OAuth/OIDC configuration which will be used to execute the supported OAuth flows. Currently only the Authorization Code flow is supported. In addition to this option, in order to enable OAuth/OIDC authentication AuthPolicy must contain the word external (such as external, external-internal, token-external etc.)

This directive supports Single Sign-on and Sign-off.

Here is an example of OAuth configuration:

OAuth server="http://auth.example.com:4444", \  
  authorize-endpoint="/oauth2/auth", \  
  token-endpoint="/oauth2/token", \  
  jwks-uri="http://auth.example.com:4444/.well-known/keys.json", \  
  jws-algorithms="RS256, RS512", \  
  client-id="cc8b0524-ad95-47e8-977d-f3a70e5279a1", \  
  client-secret="J-k64I.gIC9xD9OM7d9bczKDJ5", \  
  scope="openid", \  
  username-claim="sub", \  
  username-regexp="^(.*)@.*", \  
  username-substitution="\\1", \  
  pkce=yes, \  
  front-channel-logout=yes, \  
  back-channel-logout=yes 
The following options are supported by OAuth directive:
  • server (string): the base URI of the OAuth server. This value must match the iss (issuer) claim in an ID token. Use *-endpoint options to specify required paths. If endpoints cannot be expressed as paths relative to the server URI, absolute URIs can be used instead. This is a required parameter.

  • authorize-endpoint (string): the URI of the endpoint on the server where the user should be redirected to initiate the authentication process. If it starts with /, it will be treated as a path relative to the server URI and will be concatenated with it to get the absolute URI of the endpoint. Otherwise it will be treated as an absolute URI of the endpoint. This is a required parameter.

  • token-endpoint (string): the URI of the endpoint on the server used to exchange the authorization code for the token in the OAuth Authorization Code flow. If it starts with /, it will be treated as a path relative to the server URI and will be concatenated with it to get the absolute URI of the endpoint. Otherwise it will be treated as an absolute URI of the endpoint. This is a required parameter.

  • jwks-uri (string): the URI of the JWKS file. AllegroGraph performs offline validation of ID tokens and expects to find keys for JWT signature verification at this URI. If it starts with /, it will be treated as a path relative to the server URI and will be concatenated with it to get the absolute URI of the endpoint. Otherwise it will be treated as an absolute URI of the endpoint. This is a required parameter.

  • jws-algorithms (string): comma-separated list of JWS signature algorithms allowed in JWT tokens. Token signed with unsupported or disallowed algorithms are considered invalid. Supported algorithms are HS256, HS384, HS512, RS256, RS384, RS512 and none. HMAC-SHA2 algorithms (HS256, HS384 and HS512) expect the value of the client-secret to be used as key. Default value is "RS256, RS384, RS512".

  • client-id (string) and client-secret (string): client credentials obtained when registering AllegroGraph server as a client for the authentication service.

  • scope (string): a string containing space-separated scopes that will be requested. The default value is "openid". openid must be included as one of the values in any string list supplied as the value of this argument.

  • username-claim (string): the claim in the OIDC ID Token returned by the token-endpoint which will be used to compute the username of the internal AllegroGraph user. Defaults to sub.

  • username-regexp (string): the regular expression for matching groups in the username-claim string which are then used in username-substitution to construct the actual username. In the example above, username-regexp captures a part of the sub claim that comes before @ character in an email. username-susbstitution just takes this captured group without any modifications. Note that ` character is used as an escape character in the option values, so a common notation for the first group (\1) becomes \1`. By default both options are unspecified meaning that the value of the username-claim will be used as is.

  • pkce (boolean, yes/no): if true, PKCE will be used. Default is no.

  • front-channel-logout (boolean, yes/no): if true, OIDC Front-Channel Logout is enabled. Default is yes. Note that front-channel logout requires HTTPS. Also note that front-channel logout relies on a user-agent (typically a browser) to provide information about login session to AllegroGrash in the form of a cookie, so if this is enabled, AllegroGraph sets authentication cookie with SameSite=None. Finally, note that, to our knowledge, front-channel logout does not work out-of-the-box on Firefox and requires changing a preference value (search for "firefox cookie behavior" in a web search engine). AllegroGraph's front-channel logout HTTP endpoint is GET /oauth/logout.

  • back-channel-logout (boolean, yes/no): if true, OIDC Back-Channel Logout is enabled. Default is yes. Note that in back-channel relies on direct HTTP communication between authorization service and AllegroGraph server, so the former must be accessible from the latter. Note that AllegroGraph's back-channel logout expects session to be specified via the sid (session ID) claim in the ID Token and the corresponding Logout Token. AllegroGraph's back-channel logout HTTP endpoint is POST /oauth/logout.

AllegroGraph expects auth service to respond with JSON.

Username can currently only be extracted from the OIDC ID Token, hence the required openid scope.

User record with the given username must already exist on the AllegroGraph server for the OIDC authentication flow to work.

ID tokens generated by the authorization server to which the configuration points can be directly used for authenticating API calls. The process of obtaining ID tokens depends on your authorization system. Use the value provided where <token> appears in the examples just below.

ID token can be sent as one of these two HTTP headers:

Authorization: Bearer <ID-token> 
or
Authorization: Basic base64(<username>:<ID-token>) 
Note that Bearer authorization scheme is preferable because it does not need username to specified, but the Basic form is still useful for compatibility because it allows using ID tokens in place of a regular password in tools that automatically handle credentials in URIs as Basic HTTP authorization headers. This may be convenient in e.g. curl to avoid adding the authorization header manually:
curl http://<user>:<ID-token>@host:10035/repositories/repo/... 
but is especially useful in agtool, where the user does not have control over the underlying HTTP authorization header:
agtool load <user>:<ID-token>@host:10035/repo data.ttl  
 

Top-level directives for account management

AuditEventsToEmail VALUE

(Added in version 7.0.0 or before.) A directive that instructs the system to send notification emails to a specified address when various audit events occur. This option can be specified multiple times to cause emails to be sent to multiple addresses.

The format for this directive is

AuditEventsToEmail to="email", [smtphost="smtp-host-name"], \  
  events="comma-separated events" 

Where smtp-host-name is the name of the SMTPHost definition to be used and email is an email address. If only one SMTPHost is defined, this option can be left unspecified. Events can be any audit events (see Audit event types).

For example, here is a valid specification, assuming an SMTPHost named gmail has been defined:

AuditEventsToEmail to="[email protected]", smtphost="gmail", \  
  events="expirePassword,addUser,deleteUser" 
If there is only one SMPHost defined, smtphost can be left unspecified:
AuditEventsToEmail to="[email protected]", \  
  events="expirePassword,addUser,deleteUser" 
See Auditing email notifications for more information.
AccountExpiry
The time since the last authenticated activity of a user after which the account is permanently deleted. This option does not affect users with superuser permission. The default is that accounts do not expire. (Added in version 7.0.0 or before.)

AccountUnsuspendTimeout
The time after which suspended accounts are unsuspended automatically. See MaxFailedLogins. (Added in version 7.0.0 or before.)
LoginTimeout
A time (value like 10s, 5m, 1h). If set, AGWebView login sessions are timed out after this amount of idle time. The default is no timeout. (Added in version 7.0.0 or before.)

MaxFailedLogins
The number of failed logins in a row after which the account is suspended. Suspended accounts can be unsuspended explicitly by superuser or automatically if AccountUnsuspendTimeout is set. (Added in version 7.0.0 or before.)

PasswordChangeAllowed
A boolean (yes/no) that can be used to control whether users can change their own password. The default is yes. If no, then only superuser can change passwords. (Added in version 7.0.0 or before.)
PasswordExpiry
The time since the last password change after which the password will be expired. One cannot login with an expired password, it can only be used to change the password. The new password must not be the same as the expired password. (Added in version 7.0.0 or before.)
PasswordExpiryGrace
The time since password expiry after which the account is disabled. It's not possible to log in or change the password with a disabled account. Only the administrator can reenable accounts. This option does not affect users with superuser permission. (Added in version 7.0.0 or before.)
PasswordMinLength
The minimum number of characters all new passwords must have. The default is 0. (Added in version 7.0.0 or before.)
PasswordMinUppercaseChars
The minimum number of uppercase characters all new passwords must have. The default is 0. (Added in version 7.0.0 or before.)
PasswordMinDigitChars
The minimum number of digit characters all new passwords must have. The default is 0. (Added in version 7.0.0 or before.)
PasswordMinSpecialChars
The minimum number of non-alphanumeric characters all new passwords must have. The default is 0. (Added in version 7.0.0 or before.)
SuperUserCanAccessAllData
A boolean (yes/no) that controls whether superuser bypasses normal permission checks for triples data. If it is on (the default), then superuser will have read/write access to all repositories. If it is turned off, then superuser needs to be granted access to repositories. This is most useful when auditing is enabled and any change to user permissions is logged. (Added in version 7.0.0 or before.)

Top-level directives for the event scheduler

The event scheduler, described in the Event Scheduler document, allows users to schedule times when a script should be run. The scripts can be run once or repeated regularly. These two configuration options should be set for the scheduler to work. The first must be set in order that a script file to run can be found. The second must be set if one wants emails sent reporting that a script ran or failed to run. Emails sent when a script runs successfully also contain the output of the script.

SchedulerDir DIRECTORY
The directory in which scheduler scripts must be found. It is permitted to put subdirectories in this directory and put scripts in subdirectories. This value must be set in the configuration file if it is desired to schedule event scripts. There is no default value for this directive. (Added in version 7.0.0 or before.)
SchedulerSMTPConfig SMTP-ID
See the description of SMTPHost above. Its first argument is an ID which can be specified as the value of this directive. If it is, that SMTPHost will be used to send scheduler notification emails. This directive must be specified for such emails to be sent even if only one SMTPHost is specified. There is no default value of this directive. (Added in version 7.0.0 or before.)

Top-level directives for multi-master replication clusters

These top-level directives affect multi-master replication clusters, described in the Multi-master Replication document. The first, MaximumBackupAge, specifies how old backups of the controlling instance are allowed to be.

MaximumBackupAge
The value must be an integer (meaning a number of second) or an integer followed by s (for seconds), m (for minutes), h (for hours), or d (for days). The default is 3600 (i.e. one hour). When asked to grow a cluster an existing backup of the controlling instance is used unless it's older than this MaximumBackupAge in which case a new backup is made. See the section Controlling instance backups and MaximumBackupAge for more information on this directive. (Added in version 7.0.0 or before.)

The other settings affect how replication cluster instances are kept in sync. These latter directives specify the default values of these setting and the values may be overridden by commands which create a replication cluster and settings can also be changed once a replication cluster exists. The meaning and effect of a setting is described in the Instance Settings section of the Multi-master Replication document. Changing the settings after a cluster has been created is also described in that section. The directives are:

durability
See the description of the Durability setting for information on this directive. (Added in version 7.0.0 or before.)
distributedTransactionTimeout
See the description of the Distributed Transaction Timeout setting for information on this directive. (Added in version 7.0.0 or before.)
transactionLatencyCount
See the description of the Transaction Latency Count setting for information on this directive. (Added in version 7.0.0 or before.)
transactionLatencyTimeout
See the description of the Transaction Latency Timeout setting for information on this directive. (Added in version 7.0.0 or before.)

More on controlling memory usage

While processing a query, backend processes may allocate memory from the operating system. When a previously allocated memory area is no longer used, the processes normally do not return it to the operating system, in hopes of reusing it for subsequent queries. However, it may be advantageous to periodically return idle memory to the operating system. The MemoryCheckWhen and MemoryReleaseThreshold configuration parameters allow for this.

Note that while returning memory to the OS makes memory available to other processes, it also incurs the overhead of minor page faults on subsequent allocations in the same process.

Each shared backend and dedicated session tracks its own memory usage. When a check is made the resident set size (RSS) of the backend or session process is compared to MemoryReleaseThreshold. If the RSS is greater than MemoryReleaseThreshold then an effort is made to give back as much memory to the OS as possible.

Since this kind of check is fairly expensive, performing it too often can have a detrimental effect on overall performance. The MemoryCheckWhen directive specifies under what circumstances it should be done. Let's see a couple of examples.

Perform memory check after every 7 queries:

MemoryCheckWhen query:7 

Perform memory check after every 2 transactions:

MemoryCheckWhen transaction:2 

Perform memory check every 10 seconds:

MemoryCheckWhen time:10 

Finally, a complete configuration that would check whether the memory was above the threshold every 10 seconds and after every 2 transactions:

MemoryReleaseThreshold 2g  
MemoryCheckWhen time:10  
MemoryCheckWhen transaction:2 

Note that MemoryReleaseThreshold must be specified whenever MemoryCheckWhen is. If neither of two are specified, then no checks are ever performed.

Memory locking

You can lock indexes and/or the string table into memory if desired. This improves performance but can lead to problems if memory is tight. This can be done with the agtool memory-lock (and memory-unlock) commands. Run

% agtool memory-lock --help

for further information.

CORS directives

CORS (Cross-Origin Resource Sharing), if enabled, allows scripts run on a web page from one server to make HTTP requests to the (different) server where AllegroGraph is running. CORS is not enabled by default because if not configured properly, it can introduce security holes. The following directives enable CORS limited as the various options allow. A general tutorial on CORS is available at http://www.html5rocks.com/en/tutorials/cors/. See here in the REST/HTTP interface document for more information.

You may want to use CORS to communicate with the AllegroGraph server if you are writing a web application that will be accessing AllegroGraph but will not be served from the same domain that the server uses. This image shows a possible configuration:

A potential use of CORS and AllegroGraph

CORS support is enabled if the configuration file contains at least one of the following directives: CorsAllowAll, CorsAllowOrigin, CorsAllowRegex.

The following configuration file directives are used to configure CORS:

CorsAllowAll BOOLEAN
If set to 'yes' then requests from all origins will be accepted. When yes, values for CorsAllowOrigin and CorsAllowRegex are ignored. The default is no. (Added in version 7.0.0 or before.)
CorsAllowOrigin DOMAIN
Allow the specified origin (i.e. domain) to issue cross-site requests. Only one domain can be specified per entry but this directive can be specified multiple times. Domain names are case-insensitive. (Added in version 7.0.0 or before.)
CorsAllowRegex REGEX
Allow all origins that match the given regular expresion to issue cross-site requests. Only one regular expression is allowed per entry but this directive can be specified multiple times. Regular expression syntax is described here. Regular expressions can specified to make case-insensitive matches. (Added in version 7.0.0 or before.)

The following directives affect how CORS requests are handled when they are allowed.

CorsUrlRegex REGEX
Only enable CORS for target URLs that match the given regular expression. If this option is not specified, all URLs are allowed. If it is specified then the associated regular expression is compared to URLs being requested and allowed only if they match. This directive can be specified multiple times and if it is, CORS will be enabled for URLs that match at least one of the supplied regexes. Regular expression syntax is described here. Regular expressions can specified to make case-insensitive matches. (Added in version 7.0.0 or before.)
CorsAllowMethods METHOD-LIST
A space or comma separated list of allowed methods for cross-origin requests. List values are case-insensitive. The default is: DELETE GET OPTIONS PATCH POST PUT. (Added in version 7.0.0 or before.)
CorsAllowMethod METHOD
A single HTTP method to be added to CorsAllowMethods (defined above). The CorsAllowMethod directive can be specified multiple times. (Added in version 7.0.0 or before.)
CorsAllowHeaders HEADERS-LIST
(Added in version 7.0.0 or before.) A space or comma separated list of headers allowed in cross-site requests. Headers are case-insensitive. The default list is:

Accept Accept-encoding Authorization Content-type Dnt Origin User-agent

CorsAllowHeader HEADER
A single HTTP header to be added to CorsAllowHeaders (defined above). This directive can be specified multiple times. (Added in version 7.0.0 or before.)
CorsExposeHeaders HEADER-LIST
A space or comma separated list of custom response headers that should be readable by cross-site requests. The default is empty. (Added in version 7.0.0 or before.)
CorsExposeHeader HEADER
A single HTTP header to be added to CorsExposeHeaders (defined above). This directive can be specified multiple times. (Added in version 7.0.0 or before.)
CorsPreflightMaxAge INTEGER
Specifies how long (in seconds) a response to a preflight request should be cached by the browser. The default is 86400 (24 hours). If set to zero the corresponding HTTP header Access-Control-Max-Age will not be sent. (Added in version 7.0.0 or before.)
CorsAllowCredentials BOOLEAN
If set to 'yes' then cross-origin requests will be allowed to contain authentication info, such as cookies and auth headers. The default is 'no'. (Added in version 7.0.0 or before.)

Catalog definitions

Catalogs are locations on disk where AllegroGraph keeps its repositories. These locations are specified in the configuration file, along with some optional default settings for stores in the catalogs. Most of the time, you will want to specify all catalogs directly in the configuration file, but it is also possible to enable dynamic catalogs, which can be created and deleted either using the agtool catalogs tool or through the HTTP interface (as described in HTTP Protocol - SPARQL Endpoint).

Catalog definitions in the server configuration files serve as templates for creating databases. The parameters defined in the catalog definition will be copied to the database when it is created. Changes to the catalog definition do not influence the settings of existing databases. In order to modify parameters of existing databases, the file 'parameters.dat' in the database 'Main' directory must be edited and the database be restarted.

There are three types of catalog definitions that can occur in an AllegroGraph configuration file: special catalogs (root, system and fedshard catalogs), named catalogs, and a dynamic catalog specification. The first was seen in the example above (<RootCatalog> ... </RootCatalog>), and is used to determine where stores live that do not have a catalog specified. Named catalog specifications look similar:

<Catalog temporary>  
  Main /tmp/catalog  
</Catalog> 

The first entry specifies the catalog name, temporary in the example. Catalog names can contain any characters except slashes, backslashes, colons, and tildes.

The catalog name can then be used to specify the catalog when creating or accessing repositories.

The names root, system and fedshard are reserved and may not be used in a Catalog directive. These three catalogs must be defined in every server and if not found in the configuration file they will be created automatically by the server.

The catalog directives of the root catalog are found within between <RootCatalog> and </RootCatalog> in the configuration file.

The system catalog is where AllegroGraph records information about the operation of the server. It can be given catalog directives in the configuration file between the directives <SystemCatalog> and </SystemCatalog>.

The fedshard catalog is reserved for storing metadata for fedshard repositories and not the data itself so very little disk space is required. The only catalog directive recognized in the configuration of the fedshard catalog is the Main catalog directive specifying the directory to be used to store the directories holding metadata for each fedshard repository. This Main directive for the fedshard repositories can be specified between the <FedshardCatalog> and </FedshardCatalog> directives in the configuration file.

Finally, a dynamic catalog definition is used to provide the settings for catalogs created over HTTP. If no dynamic catalog is defined, this feature is disabled.

<DynamicCatalogs>  
  Main /tmp/dynamic  
</DynamicCatalogs> 

The directory (as well as any other catalog directories, see below) given for dynamic catalogs will be extended with a catalog name when such a catalog is created. For example, given the above configuration, a dynamic catalog named scratch would end up in /tmp/dynamic/scratch.

Catalog directives

Some of the directives allowed within a catalog definition (those marked as inheritable) can also be specified at the top-level, where they act as a default value inherited by catalogs which don't explicitly specify that setting.

Main PATHNAME
Required for every catalog. Specifies the directory in which the repositories for the catalog are stored. (Added in version 7.0.0 or before.)

TransactionLogDir PATHNAME

(Added in version 7.0.0 or before.) Specifies the directory in which transaction log subdirectories will be created for repositories in this catalog. The directory will be extended with the name of a repository. For example, if TransactionLogDir is /tmp/tlogs, then transaction logs for repository example will be stored in /tmp/tlogs/example. This parameter is optional and defaults to the value supplied for the Main parameter.

See the line in the example below

 TransactionLogDir /mnt/disk3/ag4-transaction-logs 

which says transaction logs should be placed in the /mnt/disk3/ag4-transaction-logs/[repository-name]/ directory.

The value of this directive can affect performance. See the discussion in the Performance Tuning document.

StringTableDir PATHNAME
Specifies the directory in which string table subdirectories will be created for repositories in this catalog. See TransactionLogDir for information on how directory names are constructed. This parameter is optional and defaults to the value supplied for the Main parameter. The value of this directive can affect performance. See the discussion in the Performance Tuning document. (Added in version 7.0.0 or before.)

StringTableSize INTEGER
the value must be an integer, optionally followed by a multiplier (k=2^10 or m=2^20). The value determines the minimum number of slots to use for the hash table used to map UPIs to their corresponding strings. The actual number of slots configured is the supplied value rounded up to the nearest power of two, with a minimum of 1M (1,048,576). The default number of slots is 16,777,216 (16M). The maximum possible number of slots is 536,870,912 (512M). Increasing the number of slots may result in better insert and lookup performance for repositories with a lot of unique strings. Each slot takes 4 bytes of memory. Checkpoints will take longer the more slots there are as the information stored in the slots is recorded in the transaction log during checkpoints. As said, the value of this directive can affect performance. See the discussion of directives that affect performance in this section of the Performance Tuning document. (Added in version 7.0.0 or before.)

StringTableCompression VALUE inheritable
If given, the value must be one of none (the default), lzo (same as lzo999), lzo1, lzo999 (same as lzo), or zlib. lzo999 compresses more than lzo1 but takes more time. The string table compression method can only be set when a repository is created. See the discussion of this directive here in the Performance Tuning document. (Added in version 7.0.0 or before.)

ExpectedStoreSize INTEGER inheritable
This is the number of triples one expects to have in the store. It is used by the server to select suitable values for things like internal table sizes. Most of the time, you should only worry about this when trying to squeeze out more performance. Setting it too high can lead to some wasted resources, setting it too low can result in sub-optimal performance and setting it much low (much less than the maximum effective value and less than one 25th of the real size) can cause enormous index management overhead and lead to extreme loss of performance on a continuously evolving store. The maximum effective value is one billion triples. Stores can be much bigger, of course, but values larger than one billion do not affect initial internals. As said, the value of this directive can affect performance. See the discussion of directives that affect performance in this section of the Performance Tuning document. (Added in version 7.0.0 or before.)

CheckpointInterval TIME inheritable
A time (with a value like 10s, 5m, 1h) that specifies how often checkpoints will be performed. The default value is 5m. The value of this directive can affect performance. See the discussion in the Performance Tuning document. (Added in version 7.0.0 or before.)
MaxRecoveryTime TIME inheritable
A time (with a value like 10s, 5m, 1h). AllegroGraph normally performs checkpoints at regular intervals, as configured by the CheckpointInterval directive. If MaxRecoveryTime is specified, AllegroGraph will maintain an estimate of how long recovery would take if a crash occurred at a given moment. When this estimated recovery time exceeds MaxRecoveryTime, a checkpoint will be performed. (Added in version 7.0.0 or before.)
TransactionLogSize INTEGER inheritable
A size (an integer, perhaps labeled with 'k' for kilobytes, 'm' for megabytes, or 'g' for gigabytes, for example 10m for ten megabytes) that determines how big individual transaction log files are allowed to grow. When a transaction log size meets or exceeds this size, a new transaction log file will be created. The maximum is just under 4g. (Added in version 7.0.0 or before.)
TlogSyncMethod VALUE
This parameter specifies the synchronized writing method for transaction logs. Three methods are supported: ODIRECT, SYNC, and fsync. The default (if this parameter is unspecified) is ODIRECT and that is the recommended choice on ext3 file systems. For catalogs residing on non-ext3 file systems, the other choices may yield performance benefits. (You will potentially see performance degradation in checkpointing. If that takes longer than expected and you are using a non-ext3 filesystem, try the other allowable values.) (Added in version 7.0.0 or before.)
DesiredTlogFiles INTEGER

(Added in version 7.0.0 or before.) This parameter specifies the number of transaction log files which should be preallocated at database creation time. The default value is 2. Specifying a larger value helps lower the probability of additional transaction log files being created during commits.

Note: The circumstances under which the number of tlog files may grow larger than DesiredTlogFiles are if there is a long-running backup, transaction log archiving is running slowly, or if replication is running slowly or stalled. When possible, AllegroGraph will reduce the number of transaction log files back down to DesiredTlogFiles.

InstanceTimeout TIME inheritable
The time (a value like 10s, 5m, 1h) a database instance will stay open without being accessed. The default is one hour. Starting a database instance can be time consuming. By keeping idle instances around this directive allows for trading off memory for lower worst case latency on database access. Note that this value is advisory; AllegroGraph checks for idle database instances intermittently so a given instance may linger longer than the instanceTimeout. (Added in version 7.0.0 or before.)

TransactionLogArchive PATHNAME
This directive specifies a directory for storing archived transaction log files. See Transaction Log Archiving for more details. (Added in version 7.0.0 or before.)
TransactionLogRetain
This directive is no longer used and a warning will be signaled if it appears in a configuration file. See Transaction Log Archiving for more details and how to achieve what used to be done by this directive. (Added in version 7.0.0 or before.)
TransactionLogReplicationJobname
This directive is no longer used and a warning will be signaled if it appears in a configuration file. See Transaction Log Archiving for more details and how to achieve what used to be done by this directive. (Added in version 7.0.0 or before.)

Example Configuration

What follows is a more complete example to demonstrate the various configuration options in more detail.

# Don't allow normal HTTP access, only SSL  
Port 10035  
AllowHTTP no  
SSLPort 10036  
SSLCertificate /var/lib/ag4/server.cert  
 
SettingsDirectory /var/lib/ag4/settings  
 
Backends 5  
# You can actually remove this after the first server run, to  
# reduce the risk of someone finding it here.  
SuperUser test:xyzzy  
 
ExpectedStoreSize 100000  
SessionPorts 8080-8083  
 
<RootCatalog>  
  Main /var/lib/ag4/root  
</RootCatalog>  
 
<Catalog fast>  
  ExpectedStoreSize 2000000  
  CheckpointInterval 1h       
  Main /var/lib/ag4/fast  
  StringTableDir /mnt/disk2/ag4-string-tables  
  TransactionLogDir /mnt/disk3/ag4-transaction-logs  
</Catalog>  
 
<DynamicCatalogs>  
  Main /var/lib/ag4/dynamic  
</DynamicCatalogs> 

Changing database parameters

In some circumstances, it is desirable to modify the settings of an existing database by editing the 'parameters.dat' file in the database main directory. The syntax of this file is similar to that of the server configuration file, but only the parameters that are normally present inside of a catalog definition are allowed.

For example, the 'parameters.dat' file for a database 'demo' created with the 'fast' catalog definition above would look like this:

CheckpointInterval 1h  
Main /var/lib/ag4/fast  
StringTableDir /mnt/disk2/ag4/fast 

It might be edited to change the CheckpointInterval. It is also possible to add new file placement rules. When modifying any of the file placement related parameters of a database, care must be taken to make sure that all files that constitute the current database state are still visible to the database. For example, if the StringTableDir directory in the database above should be removed, all files in /mnt/disk2/ag4/fast/demo/ would need to be manually moved into the main directory of the database, /var/lib/ag4/fast/demo/.

Note that resetting some parameters in 'parameters.dat' has no effect. In particular, changing ExpectedStoreSize in parameters.dat does nothing. The only way to change that is to set the option in the configuration file and recreate the database.

When moving around database files, it is important to know that some of these files are sparse, i.e. they contain holes (unallocated blocks). Many file management utilities (like 'cp' and 'tar') can optionally preserve file sparseness, but care should be taken to make sure that copies of database files don't become unexpectedly large after a manual manipulation.

Server control: starting and stopping the server

The method used to start and stop the AllegroGraph server depends on the type of install: an RPM install or installation from a tar.gz file (see Server Installation). The RPM install places files in specific locations. The configuration file agraph.cfg is placed in /etc/agraph/ and you can use /sbin/service to start and stop Allegrograph:

You can start AllegroGraph by running:  
/sbin/service agraph start  
 
You can stop AllegroGraph by running:  
/sbin/service agraph stop 

In addition, chkconfig can be used to make AllegroGraph start when the system boots. For example:

chkconfig agraph on 

You can also use agraph-control with an RPM install.

The tar.gz installation is more flexible, and you choose the AllegroGraph directory as part of the installation process (again, see Server Installation). The typical way to start and stop AllegroGraph installed from a tar.gz file is to use agraph-control.

Starting multiple servers

You can run multiple servers on the same machine if desired. Separate instances must have different settings directories, and must use different ports. These are specified in the configuration file so each server instance must have its own configuration file. If you try to run two servers with conflicting information, the second will fail to start with a message similar to:

Daemonizing...  
Starting server failed:  
 
  There appears to already be an AllegroGraph server running (pid 12803).  
  If it is your intention to run another AllegroGraph server  
  simultaneously,  
  please make a separate configuration file which has different values  
  for the following parameters:  
 
  Port               61111  
  SettingsDirectory  /disk1/allegrograph/settings/  

agraph-control

agraph-control is a script that can be used to start and stop AllegroGraph. It also can process other commands, as described below. agraph-control is located in the bin/ subdirectory of the AllegroGraph directory. The calling template is

agraph-control [OPTIONS] <command> 

Additionally when command is start or restart, agraph-control will also accept additional command-line options, so the calling sequence is

agraph-control [CONTROL-OPTIONS] <start | restart > [-- COMMAND-LINE-OPTIONS] 

The command-line-options are the arguments accepted by the agraph program (which is called by agraph-control) and are listed below.

Control options

There are three control options: --config, --cluster-config and --cluster.

--config

The value of --config should be the path of the configuration file. The usual location of that file in a tar.gz install is the lib/ subdirectory of the AllegroGraph directory. The usual location in an RPM install is /etc/agraph/. The default name is agraph.cfg.

Thus, with a tar.gz install, you can start the AllegroGraph server with

[Agraph dir]/bin/agraph-control --config [Agraph dir]/lib/agraph.cfg start 
If --config is not specified, the behavior is as follows:
  • For an RPM install when not running as root, there is no default and --config must have a value.

  • For an RPM install when running as root, the default is /etc/agraph/agraph.cfg.

  • For a tar.gz install, the default is agraph.cfg in lib/ subdirectory of the AllegroGraph directory.

If the file specified as the value of --config is not found, the AllegroGraph server is not started and a message like the following is printed:
Cannot locate configuration file (tried <supplied path>). 
If --config is unspecified, and the agraph.cfg file is not found in the default location or you are not running as root with an RPM install, the AllegroGraph server is not started and the following message is printed:
Cannot determine location of configuration file.  Please use --config 
--cluster-config
If you are running distributed AllegroGraph (that is, one or more of your repositories is a distributed repository (see Distributed Repositories Setup) then the system must find the cluster configuration file. This file is typically named agcluster.cfg and is typically located in the lib/ subdirectory of the AllegroGraph directory. This argument allows you to specify the location of that file. This is necessary if the file has a name other than the default (agcluster.cfg), is in a location other than the lib/ subdirectory, or if agraph-control is run from a location other than where it was installed.
--cluster
If you are running distributed AllegroGraph (that is, one or more of your repositories is a distributed repository (see Distributed Repositories Setup) and the agraph-control program is in the bin/ subdirectory of the install directory of one of the distributed servers, and the agcluster.cfg file has the default name and location (in the lib/ subdirectory), then agraph-control knows where the configuration file is so the shorter --cluster argument suffices. If there are problems, use --cluster-config and --config instead.

--port N
Specifies that the AllegroGraph server will communicate on port number N, thus overriding the value specified by the Port directive in the configuration file (see above). This argument is convenient when the number in the configuration file happens to be in use.
--sslport N
Use this SSL port number rather than the one specified by the SSLPort directive in the configuration file (see above). This argument is convenient when the number in the configuration file happens to be in use.

Control commands

The commands to agraph-control are:

status
Writes to stdout, "up" if the server is running and "down" if not.
start
Start the AllegroGraph server. This has no effect if the server is already running.
stop
Stop the AllegroGraph server. This is the normal stop command and it attempts to perform a clean shutdown of all open databases.
force-stop
Stop the AllegroGraph server. This is the emergency stop command and open databases may not be cleanly closed.
restart
Requests that the server shut itself down, if running, and then start back up again.

AllegroGraph service daemon signal handling

The signals used by the AllegroGraph service daemon are:

SIGTERM
for normal stopping, used by the stop command.
SIGQUIT
for emergency stopping, used by the force-stop command.

Exit Status

For the status command, a 0 exit status is returned if the server is up, non-zero if the server is down.

For all other commands, the exit status is 0 if the command was executed succesfully, and non-zero if an error is reported during command execution.

The agraph program

agraph-control is a script which launches the actual program, named agraph. While agraph-control is recommended when starting the server, you can use agraph, particularly when you wish to invoke options not available to agraph-control. agraph accepts the following command-line arguments:

--config file
The location of the configuration file. Defaults to agraph.cfg in the executable's directory, or, failing that, /etc/agraph/agraph.cfg. (If the configuration file cannot be found, AllegroGraph does not start and prints the message No configuration file found.
--log-dir directory
Specify where the server log files are written. Overrides the LogDir directive.
--debug
Start the server in debug mode, which means logging will be more verbose.
--log-level level
Set an explicit log-level (debug, info, warn, or error), or specify log-levels per category, for example: debug,daemon:info,storage:warn.

--port N
Use this port number rather than the one specified by the Port directive in the configuration file (see above). This argument, similar to the --port argument to agraph-control (see above) is convenient when the number in the configuration file happens to be in use.
--sslport N
Use this SSL port number rather than the one specified by the SSLPort directive in the configuration file (see above). similar to the --sslport argument to agraph-control (see above) is convenient when the number in the configuration file happens to be in use.

--http-trace file-pathname
Write a log of all HTTP traffic to the file specified. If specified, overrides the HttpTrace configuration directive. If the pathname is relative, the location is with respect to the log directory.
--http-trace-options options
If specified, overrides the HttpTraceOptions configuration directive. A comma separated list of options. Options starting with the character + turn on the corresponding log category. Those starting with - turn them off (allowing you to enable a general category and turn off specific items). Use the max-message-size= option to truncate overly long messages. The default is `+xmit,max-message-size=1000`. The available log categories are listed in the Debugging section
of the AllegroServe documentation.
--pid-file file
Determines where the process id of the server is written. Overrides the PidFile directive.
--run-as user
If started as root, run AllegroGraph as the specified user. Overrides the RunAs directive.
--no-daemonize
If specified, then the service daemon will run in the foreground.
--stop-server
Stop the AllegroGraph server. Either the --pid-file or --config parameter must also be specified to identify the server instance that should be stopped.
--stop-timeout seconds
Specifies how long to wait before giving up on a --stop-server request. Must be an integer greater than 0. If --force has also been supplied, then the default value is 10; otherwise the default value is 60.
--version
Print version information (such as the version number and build date).
--short-version
Print just the version number.
--help
Print information about these arguments.

Troubleshooting

Shared memory size and permission to use /dev/shm

AllegroGraph uses POSIX shared memory for inter-process communication.

Each AllegroGraph instance requires a certain amount of shared memory (depending on the ExpectedStoreSize setting). The actual size is reported in agraph.log when an instance is started.

On Linux, the shared memory comes from tmpfs, which is typically mounted on /dev/shm. Default size is half of RAM. To resize, issue a command like the following as the root user:

mount -o remount,size=<size> <shm-device-file> 

For example:

mount -o remount,size=8G /dev/shm 

To make the change permanent, /etc/fstab needs to be updated or the above command must be run from a startup script such as /etc/rc.local.

/dev/shm is usually mounted with permissions that allow any process to use it, for example:

$ ls -ld /dev/shm  
drwxrwxrwt 2 root root 40 Oct 17 16:31 /dev/shm/ 

However, sites with strict security policies may have /dev/shm mounted with tight permissions (for example, to only allow root to use shared memory). For AllegroGraph to operate, /dev/shm must have permissions which allow at least the RunAs user to read and write to it. Consult with your systems administrator if AllegroGraph fails to start up due to the permissions on /dev/shm.

If a previously-working instance doesn't start due to shared memory problem, then there may be a lingering process which still has a handle on a shared memory segment.

$ <stop-allegrograph>  
# Check total size, available and used size.  
$ df -h /dev/shm  
# If the 'Used' column shows a non-trivial amount,  
# look for processes that use /dev/shm.  
$ lsof /dev/shm  
# Maybe kill offending processes  
$ kill -9 <pid1> <pid2> ...  
$ <start-allegrograph>