The AllegroGraph server is the running program which manages the various AllegroGraph triple-stores (databases). The server must be set up as described in Server Installation.
The AllegroGraph server requires a configuration file in order to start up. Usually, this file is specified using the
--config command-line argument. A minimal file could look like this:
SettingsDirectory /tmp/ag4/settings SuperUser test:xyzzy <RootCatalog> Main /tmp/ag4/root </RootCatalog>
An AllegroGraph configuration file consists of a set of top-level directives, and one or more catalog definitions. The syntax is straight-forward: directives (both top-level and within a catalog definition) consist of an alphanumeric word, whitespace, and then the value of the directive. Indentation is ignored. A directive can span multiple lines by escaping newlines with a back-slash, in which case both the newline and the backslash will be treated as if they are not there. Catalog definitions are delimited by pseudo-XML markers like
<RootCatalog>. Lines starting with a
# are treated as comments. Each directive defining a parameter must be on its own single line (or multiple lines joined by backslashes as described above). Specifying more than one parameter on a single line will result in an error, or, possibly, in the first parameter being defined incorrectly and the remaining ones on the line not being defined at all.
The directories named in this file must either already exist and be writeable by the user running the server, or that user must be able to create them himself. You'll usually not want to use temporary paths as in the example, of course.
Parameter values must not be quoted. Spaces are not allowed in parameter values.
Any relative path in the config file is initially resolved with respect to the directory containing the config file. If you specify another directory using the BaseDir directive, all subsequent relative paths are resolved relative to that directory. BaseDir can be specified as often as you like and can itself be a relative pathname (which will be resolved with using the BaseDir value in use, or, if not previously specified, with respect to the directory containing the config file). If BaseDir is specified multiple times, relative pathnames in other directives are resolved with respect to the most recent BaseDir value.
- A directory pathname which will be used to resolve relative pathnames in subsequent directives in the config file. Can be specified multiple times in the config file, with new values replacing older ones. See just above for more details.
- Required setting. Specifies the directory in which the server stores persistent information such as user accounts.
- A boolean (
no) that can be used to turn off HTTP access to the server. Default is
- A boolean (
no) that can be used to turn on auditing. Default is
- Specifies the maximum number of processes spawned to handle HTTP requests (note that session processes do not count toward this limit). Default is 10.
- Can be
noand defaults to
yesevery new store will be (by default) a Callimachus store. See Callimachus.
- Where to find the auxiliary files Callimachus uses to configure its Soundexing support. Generally speaking, you should not need to set this value. The default is the lib/soundex-data/ directory of your server installation. See Callimachus.
- A switch to turn off Evaluate arbitrary code permissions globally. If it is
yes(the default), then the Evaluate arbitrary code permission bits are in effect. If it is
no, then arbitrary code evaluation is disabled for all users (including superuser) regardless of the value of a user's actual permissions.
- Determines the host on which the HTTP server listens. Can be left out to have the server listen on all interfaces. Set to
localhostto listen only locally.
- This parameter can be used to make HTTP requests made by the server (for example, when a SPARQL query loads data from an external URL) go through a proxy. Valid values are
hostname.net:8888, or, when the proxy requires authentication,
- The number of initial HTTP workers to be started by the AllegroGraph server. The default is 50. The number should be larger than the number of backends (see
Backendsabove) plus anticipated frontend sessions (used, for example, by Webview). Too few workers may cause long-running requests (like opening a triple store) to delay other concurrent requests.
- Specifies the directory where the server log files are written. The primary log file is agraph.log, the secondary file is agraph-fallback.log. The secondary is only used when writing to the primary fails. The secondary file is preallocated to a size of 1MiB, so some log messages can be written even if the filesystem is full. On server startup or if writing to the primary becomes possible again, the contents of the secondary log file are automatically appended to the primary and the secondary log file is reinitialized.
- A 'memory release specification'. It can be used multiple times. Each description must be in the form
name:valuewhere name can be
timeand value must be a number of items to run between checks (for query and transaction) or a delay in seconds (for time).
- A size specifying the threshold at which a memory release will occur. The size can be specified in gigabytes (e.g. "3g"), megabytes (e.g. "3000m"), kilobytes (e.g. "3000000k") or bytes (e.g. "3000000000").
- A file to which the server writes out its process id.
- If supplied, must be an integer. Used to set the port on which the daemon runs its HTTP server. When not given, this defaults to
- Specifies the query processor used for SPARQL queries. This parameter can be overridden for specific queries (in an API specific manner). See the discussion in the SPARQL documentation for more details on the available engines and on how to choose amongst them.
- A query option specification. It can be used multiple times. Each specification is equivalent to a query prefix option. The global configuration directive "QueryOption NAME=VALUE" is the same as the query prefix option "PREFIX franzOption_NAME:
". See here in the SPARQL Reference for a list of SPARQL query options. An example is
- If given, must be an integer range (e.g. 13000-13020). When using replicas (see Replication and Warm Standby), the replication primary requires a separate listening port for each replica. The operating system will choose an available port for each replica (when the replica is set up) if no value is given for this option, and that will always work. However, if there is a firewall between the replication primary and (any of) the replicas, the firewall administrator may need to configure the firewall to allow incoming connections from the replicas to the primary. That configuration process can be aided by limiting the range of ports which can be used, and that is what this parameter does. If a range is specified, only those ports will be used by replicas. Any replica which might become a primary should have this parameter also specified in its configuration exactly as it is for the primary. Note that if no port in the range is available when setting up a replica, setting up the replica will fail. Therefore, the size of the range of ports should be at least the maximum expected number of replicas.
- Have the server, if started by root, run as the given user instead (defaults to
- If given, it is the name of the log file to which HTTP traffic is to be dumped.
- If given, it is a comma separated list of options. Options starting with the character + turn on the corresponding log category. Those starting with - turn them off. Use the max-message-size= option to truncate overly long messages. The default is +xmit,max-message-size=1000. The available log categories are listed in the AllegroServe documentation.
- If given, must be the server's host name or IP address for use in the URLs returned upon session creation. Useful when deploying a load balancer (like Amazon's Elastic Load Balancer) for sending the SessionHost string in the returned session URL instead of echoing the load-balancer's host name from the client request.
- If given, must be an integer range like
8000-8020. Defines the ports that will be used for sessions. Useful when these need to be opened in a firewall or similar. When not specified, random ports will be used.
Defines a named SMTP configuration which can be used to send emails. Multiple
SMTPHostdefinitions are allowed.
SMTPHostone can associate a login name, password, port, etc with a server. The following example defines the
SMTPHost gmail \ server="smtp.gmail.com", ssl=true, starttls=false,\ from="email@example.com", login="firstname.lastname@example.org", \ password="somepassword"
- The following options are supported by
server (string): the hostname or IP address of the server (example:
"127.0.0.1"). This is a required parameter.
port (integer): defaults 25 for non-SSL, 465 for SSL (example:
ssl (boolean): defaults to false (example:
starttls (boolean): defaults to false (example:
from (string): the email address to which the
From:header of emails sent via this SMTPHost will be set. This is a required parameter.
login (string): the user on the remote server (example:
password (string): the password corresponding to
password-command (string): a string suitable to be executed as a shell command. The specified command should output a single line containing the password to stdout. This is intended to avoid storing plaintext passwords in the configuration file.
- If given, the HTTP server will use this value as the base-url when parsing SPARQL queries. When not given, the URL of the request is used instead.
- When an
SSLPortis given, this must point to a file containing a server certificate and private key, PEM-encoded.
- An integer. If given, an SSL HTTP server will be run on this port.
- If given, must be a string in
name:passwordformat. The server will ensure, on startup, that a superuser with this name and password exists. Note that this means anyone that can read your configuration file has full access to the server. It is recommended to use the server setup script to create a superuser instead, or if you do use this directive, remove it after the first run of the server has created the user.
- Specifies the directory in which AllegroGraph may create temporary files. Defaults to the system's designated temp dir (typically
- These options control the transaction log archiver (described in Transaction Log Archiving). See this section on the transaction log configuration parameters for possible values and further details.
Top-level directives for account management
A directive that instructs the system to send notification emails to a specified address when various audit events occur. This option can be specified multiple times to cause emails to be sent to multiple addresses.
The format for this directive is
smtp-host-nameis the name of the
SMTPHostdefinition to be used. If only one SMPTHost is defined, this argument can be left out.
For example, here is a valid specification, assuming an
SMTPHostnamed gmail has been defined:
AuditEventsToEmail to="email@example.com", smtphost="gmail", events="expirePassword,addUser,deleteUser"
- If there is only one
SMPHostdefined, smtphost can be left out:
AuditEventsToEmail to="firstname.lastname@example.org", events="expirePassword,addUser,deleteUser"
- See Audit email notifications in Auditing for more information.
- No longer supported. Use
AuditEventsToEmaildescribed just above. See this section of the Auditing document for more information.
- No longer supported. Use
AuditEventsToEmaildescribed just above. See this section of the Auditing document for more information.
- The time since the last authenticated activity of a user after which the account is permanently deleted. This option does not affect users with superuser permission. The default is that accounts do not expire.
- The time after which suspended accounts are unsuspended automatically. See
- A time (value like
1h). If set, AGWebView login sessions are timed out after this amount of idle time. The default is no timeout.
- The number of failed logins in a row after which the account is suspended. Suspended accounts can be unsuspended explicitly by superuser or automatically if
- A boolean (
no) that can be used to control whether users can change their own password. The default is
no, then only superuser can change passwords.
- The time since the last password change after which the password will be expired. One cannot login with an expired password, it can only be used to change the password.
- The time since password expiry after which the account is disabled. It's not possible to log in or change the password with a disabled account. Only the administrator can reenable accounts. This option does not affect users with superuser permission.
- The minimum number of characters all new passwords must have. The default is 0.
- The minimum number of uppercase characters all new passwords must have. The default is 0.
- The minimum number of digit characters all new passwords must have. The default is 0.
- The minimum number of non-alphanumeric characters all new passwords must have. The default is 0.
- A boolean (
no) that controls whether superuser bypasses normal permission checks for triples data. If it is on (the default), then superuser will have read/write access to all repositories. If it is turned off, then superuser needs to be granted access to repositories. This is most useful when auditing is enabled and any change to user permissions is logged.
AuditEventsToEmail to="email", [smtphost="smtp-host-name"], events="comma-separated events"
More on controlling memory usage
While processing a query, backend processes may allocate memory from the operating system. When a previously allocated memory area is no longer used, the processes normally do not return it to the operating system, in hopes of reusing it for subsequent queries. However, it may be advantageous to periodically return idle memory to the operating system. The MemoryCheckWhen and MemoryReleaseThreshold configuration parameters allow for this.
Note that while returning memory to the OS makes memory available to other processes, it also incurs the overhead of minor page faults on subsequent allocations in the same process.
Each shared backend and dedicated session tracks its own memory usage. When a check is made the resident set size (RSS) of the backend or session process is compared to MemoryReleaseThreshold. If the RSS is greater than MemoryReleaseThreshold then an effort is made to give back as much memory to the OS as possible.
Since this kind of check is fairly expensive, performing it too often can have a detrimental effect on overall performance. The MemoryCheckWhen directive specifies under what circumstances it should be done. Let's see a couple of examples.
Perform memory check after every 7 queries:
Perform memory check after every 2 transactions:
Perform memory check every 10 seconds:
Finally, a complete configuration that would check whether the memory was above the threshold every 10 seconds and after every 2 transactions:
MemoryReleaseThreshold 2g MemoryCheckWhen time:10 MemoryCheckWhen transaction:2
Note that MemoryReleaseThreshold must be specified whenever MemoryCheckWhen is. If neither of two are specified, then no checks are ever performed.
Catalogs are locations on disk where AllegroGraph keeps its triple-stores. These locations are specified in the configuration file, along with some optional default settings for stores in the catalogs. Most of the time, you will want to specify all catalogs directly in the configuration file, but it is also possible to enable dynamic catalogs, which can be created and deleted through the HTTP interface (as described in HTTP Protocol - SPARQL Endpoint).
Catalog definitions in the server configuration files serve as templates for creating databases. The parameters defined in the catalog definition will be copied to the database when it is created. Changes to the catalog definition do not influence the settings of existing databases. In order to modify parameters of existing databases, the file 'parameters.dat' in the database 'Main' directory must be edited and the database be restarted.
There are three types of catalog definitions that can occur in an AllegroGraph configuration file: a root catalog, named catalogs, and a dynamic catalog specification. The first was seen in the example above (
<RootCatalog> ... </RootCatalog>), and is used to determine where stores live that do not have a catalog specified. Named catalogs look similar:
<Catalog temporary> Main /tmp/catalog </Catalog>
Their opening line specifies their name, which can contain any characters except slashes, backslashes, colons, and tildes. This name can then be used as catalog name when creating or accessing triple-stores.
<DynamicCatalogs> Main /tmp/dynamic </DynamicCatalogs>
The directory (as well as any other catalog directories, see below) given for dynamic catalogs will be extended with a catalog name when such a catalog is created. For example, given the above configuration, a dynamic catalog named
scratch would end up in
Some of the directives allowed within a catalog definition (those marked as inheritable) can also be specified at the top-level, where they act as a default value inherited by catalogs which don't explicitly specify that setting.
- Required for every catalog. Specifies the directory in which the triple-stores for the catalog are stored.
Specifies the directory in which transaction log subdirectories will be created for triple-stores in this catalog. The directory will be extended with the name of a triple-store. For example, if
/tmp/tlogs, then transaction logs for triple-store
examplewill be stored in
/tmp/tlogs/example. This parameter is optional and defaults to the value supplied for the
See the line in the example below
- which says transaction logs should be placed in the /mnt/disk3/ag4-transaction-logs/[triple-store-name]/ directory.
- Specifies the directory in which string table subdirectories will be created for triple-stores in this catalog. See
TransactionLogDirfor information on how directory names are constructed. This parameter is optional and defaults to the value supplied for the
- An integer. This is the number of triples one expects to have in the store. It is used by the server to select suitable values for things like internal table sizes. Most of the time, you should only worry about this when trying to squeeze out more performance. Setting it too high can lead to some wasted resources, setting it too low can result in sub-optimal performance and setting it much low (much less than the maximum effective value and less than one 25th of the real size) can cause enormous index management overhead and lead to extreme loss of performance on a continuously evolving store. The maximum effective value is 1000000000 (i.e. one billion; the units, recall, are triples). Stores can be much bigger, of course, but values larger than one billion do not affect initial internals.
- A time (value like
1h). This parameter, if set, enables dynamic checkpointing for a database. By default, AllegroGraph will write a checkpoint in regular intervals, as configured by the
CheckpointIntervalparameter. If dynamic checkpointing is enabled, a checkpoint will be written whenever recovery from the transaction log file would exceed the time to which
MaxRecoveryTimeis set. This is useful on databases with little write activity. Note that the
MinimumCheckpointIntervalis still observed.
- A time (value like
1h) that is used to determine the amount of time between checkpoint writes for the store. A higher value increases the recovery time but might make checkpoints happen less frequently.
A time (value like
1h) that is used to determine the minimum period that must elapse between two checkpoints. This parameter defaults to whatever
CheckpointIntervalhas been set to or, if
MaxRecoveryTimehas been set, to 5 minutes.
Regardless of the value of MinimumCheckpointInterval and CheckpointInterval, a checkpoint will always occur after a new transaction log file is created (see TransactionLogSize setting).
- A size (for example
10m) that determines how big individual transaction log files are allowed to grow. When a transaction log size meets or exceeds this size, a new transaction log file will be created. The maximum is just under 4GB.
- This parameter specifies the synchronized writing method for transaction logs. Three methods are supported: ODIRECT, SYNC, and fsync. The default (if this parameter is unspecified) is ODIRECT and that is the recommended choice on ext3 file systems. For catalogs residing on non-ext3 file systems, the other choices may yield performance benefits. (You will potentially see performance degradation in checkpointing. If that takes longer than expected and you are using a non-ext3 filesystem, try the other allowable values.)
This parameter specifies the number of transaction log files which should be preallocated at database creation time. The default value is 2. Specifying a larger value helps lower the probability of additional transaction log files being created during commits.
Note: The circumstances under which the number of tlog files may grow larger than
DesiredTlogFilesare if there is a long-running backup, transaction log archiving is running slowly, or if warm standby replication is running slowly or stalled. When possible, AllegroGraph will reduce the number of transaction log files back down to DesiredTlogFiles.
- The time (a value like
1h) a database instance will stay open without being accessed. The default is one hour. Starting a database instance can be time consuming. By keeping idle instances around this directive allows for trading off memory for lower worst case latency on database access. Note that this value is advisory; AllegroGraph checks for idle database instances intermittently so a given instance may linger longer than the
A bigger example to demonstrate what some of the options can look like.
# Don't allow normal HTTP access, only SSL Port 10035 AllowHTTP no SSLPort 10036 SSLCertificate /var/lib/ag4/server.cert SettingsDirectory /var/lib/ag4/settings Backends 5 # You can actually remove this after the first server run, to # reduce the risk of someone finding it here. SuperUser test:xyzzy ExpectedStoreSize 100000 SessionPorts 8080-8083 <RootCatalog> Main /var/lib/ag4/root </RootCatalog> <Catalog fast> ExpectedStoreSize 2000000 CheckpointInterval 1h Main /var/lib/ag4/fast StringTableDir /mnt/disk2/ag4-string-tables TransactionLogDir /mnt/disk3/ag4-transaction-logs FilePlacementRule ^index-spogi-.*\..idx$ /mnt/disk4/ag4-spogi-even FilePlacementRule ^index-spogi-.*\..idx$ /mnt/disk5/ag4-spogi-odd </Catalog> <DynamicCatalogs> Main /var/lib/ag4/dynamic </DynamicCatalogs>
Changing database parameters
In some circumstances, it is desirable to modify the settings of an existing database by editing the 'parameters.dat' file in the database main directory. The syntax of this file is similar to that of the server configuration file, but only the parameters that are normally present inside of a catalog definition are allowed.
For example, the 'parameters.dat' file for a database 'demo' created with the 'fast' catalog definition above would look like this:
CheckpointInterval 1h Main /var/lib/ag4/fast StringTableDir /mnt/disk2/ag4/fast
It might be edited to change the CheckpointInterval. It is also possible to add new file placement rules. When modifying any of the file placement related parameters of a database, care must be taken to make sure that all files that constitute the current database state are still visible to the database. For example, if the StringTableDir directory in the database above should be removed, all files in /mnt/disk2/ag4/fast/demo/ would need to be manually moved into the main directory of the database, /var/lib/ag4/fast/demo/.
Note that resetting some parameters in 'parameters.dat' has no effect. In particular, changing
ExpectedStoreSize in parameters.dat does nothing. The only way to change that is to set the option in the configuration file and recreate the database.
When moving around database files, it is important to know that some of these files are sparse, i.e. they contain holes (unallocated blocks). Many file management utilities (like 'cp' and 'tar') can optionally preserve file sparseness, but care should be taken to make sure that copies of database files don't become unexpectedly large after a manual manipulation.
The method used to start and stop the AllegroGraph server depends on the type of install: an RPM install or installation from a tar.gz file (see Server Installation). The RPM install places files in specific locations. The configuration file agraph.cfg is placed in /etc/agraph/ and you can use /sbin/service to start and stop Allegrograph:
You can start AllegroGraph by running: /sbin/service agraph start You can stop AllegroGraph by running: /sbin/service agraph stop
In addition, chkconfig can be used to make AllegroGraph start when the system boots. For example:
chkconfig agraph on
You can also use agraph-control with an RPM install.
The tar.gz installation is more flexible, and you choose the AllegroGraph directory as part of the installation process (again, see Server Installation). The typical way to start and stop AllegroGraph installed from a tar.gz file is to use agraph-control.
agraph-control is a script that can be used to start and stop AllegroGraph. It also can process other commands, as described below. agraph-control is located in the bin/ subdirectory of the AllegroGraph directory. The calling template is
agraph-control [options] <command>
The one option to agraph-control is
--config. Its value should be the path of the configuration file. The usual location of that file in a tar.gz install is the lib/ subdirectory of the AllegroGraph directory. The usual location in an RPM install is /etc/agraph/. The default name is agraph.cfg.
Thus, with a tar.gz install, you can start the AllegroGraph server with
[Agraph dir]/bin/agraph-control --config [Agraph dir]/lib/agraph.cfg start
--config is not specified, the behavior is as follows:
For an RPM install when not running as root, there is no default and
--configmust have a value.
For an RPM install when running as root, the default is /etc/agraph/agraph.cfg.
For a tar.gz install, the default is agraph.cfg in lib/ subdirectory of the AllegroGraph directory.
If the file specified as the value of
--config is not found, the AllegroGraph server is not started and a message like the following is printed:
Cannot locate configuration file (tried <supplied path>).
--config is unspecified, and the agraph.cfg file is not found in the default location or you are not running as root with an RPM install, the AllegroGraph server is not started and the following message is printed:
Cannot determine location of configuration file. Please use --config
The commands to agraph-control are:
- Start the AllegroGraph server. This has no effect if the server is already running.
- Stop the AllegroGraph server. This is the normal stop command and it attempts to perform a clean shutdown of all open databases.
- Stop the AllegroGraph server. This is the emergency stop command and open databases may not be cleanly closed.
- Requests that the server reload the configuration file.
AllegroGraph service daemon signal handling
The signals used by the AllegroGraph service daemon are:
- for normal stopping, used by the stop command.
- for emergency stopping, used by the force-stop command.
- for reloading the config file, used by the reload command.
The agraph program
agraph-control is a script which launches the actual program, named agraph. While agraph-control is recommended when starting the server, you can use agraph, particularly when you wish to invoke options not available to agraph-control. agraph accepts the following command-line arguments:
- The location of the configuration file. Defaults to
agraph.cfgin the executable's directory, or, failing that,
/etc/agraph/agraph.cfg. (If the configuration file cannot be found, AllegroGraph does not start and prints the message No configuration file found.
- Specify where the server log files are written. Overrides the
- Start the server in debug mode, which means logging will be more verbose.
- Set an explicit log-level (debug, info, warn, or error), or specify log-levels per category, for example:
- Write a log of all HTTP traffic to the file specified.
- A comma separated list of options. Options starting with the character + turn on the corresponding log category. Those starting with - turn them off. Use the max-message-size= option to truncate overly long messages. The default is +xmit,max-message-size=1000. The available log categories are listed in the AllegroServe documentation.
- Determines where the process id of the server is written. Overrides the
- If started as root, run AllegroGraph as the specified user. Overrides the
- Print information about these arguments.