Welcome to AllegroGraph 4.0
Welcome to the new AllegroGraph v4.0. This latest version of AllegroGraph is designed take full advantage of symmetric multi-processing hardware and brings enterprise database features to the world of Semantics and RDF. AllegroGraph, now incorporates:
- Full ACID database compliance.
- Full Recoverability.
- Online backup and restore.
- Automatic Indexing.
- HTTP-based client/server libraries supporting: Java, Sesame, Jena and Python.
This release of AllegroGraph v4.0 is focused on enhancements to database backup and restore, Java Jena client library compliance, and overall product stability improvements.
AllegroGraph Server: General Changes
Bug18679 - Made part->value function more consistent
The part->value function now returns type-codes for both UPIs and future-parts. Previously, it returned a type-code for UPIs and a keyword (like :literal or :anon) for future-parts. [Lisp API].
Bug18679 - Removed the deprecated part-value
The deprecated part-value function has been removed. Use part->value in its place. [Lisp API]
Bug18738 - Make query-planner work on remote-triple-stores
Previously, select queries that needed to re-order their clauses would instead signal an error on remote-triple-stores. This patch corrects this problem and lets remote-triple-stores work correctly with the query-planner. [Lisp API]
Bug18840, Bug19003 - Inconsistent state after duplicate purging
Total triple counts were wrong after duplicate purging. By default, AllegroGraph purges duplicate triples from the index during merge operations. Previously, when duplicate triples were purged, the total count of triples in the store was not updated. Also, in most cases, one of the seven triple indexes would not be purged of duplicates, resulting in wasted storage and potentially inconsistent query results. These problems have been corrected.
Bug18879 - Corrected add-triple so that it could accept large strings
Unlike the bulk loaders, the add-triple function was unable to accept strings longer than 8092 characters and would signal an error. Now add-triple can accept strings of arbitrary size.
Bug18800 - Unbound prolog variables could pollute select
query results
Queries like
(select (?s ?t)
(q ?s !<http://www.franz.com/sna#marriage>
!<http://www.franz.com/sna#peruzzi>)
(optional (q ?s !rdf:type ?t)))
could return unbound expessions like `{Unbound 14a170c9}` instead of `nil`. Now these queries correctly return `nil`.
Bug18880 - Corrected serialization of RDF/XML
The serializer sometimes added extra angle brackets around typed literals. Now it correctly outputs them as, e.g.,
"42"^^<http://www.w3.org/2001/XMLSchema#int>
Bug18947 - Make service daemon responsible for deleting stores
delete-triple-store used to only check whether the store was open in the current image, and thus deleted stores even if another process had them open. This moves the responsibility for deleting stores to the service-daemon, which actually knows whether anyone has the store open.
Bug18949 - Fix geospatial type mapping in federated stores
AllegoGraph was not handling geospatial type-mappings in federated-triple-stores and could fail to return all of the triples from every leaf store. This is corrected.
Bug18972 - Correct printing of times with fractional seconds
When a time or dateTime with fractional seconds had leading zeros in the fractional seconds part, AllegroGraph could fail to print it. I.e., a time like 10:31:55.028192 might print as 10:31:55.281900. This printing problem has been corrected.
Bug19010 - Fix 'store in use' bug when accessing store over HTTP.
Fixes a bug where a store accessed over HTTP could not be deleted or replaced through the direct Lisp API until its handle had timed out.
Bug19018 - Make sure instance processes are cleaned up when breaking
Fixes a few cases where unexpected errors would cause the daemon to get confused about the stores it had open (instance crash, client dying while opening or creating).
Bug19024 - Show catalog name in store-status error messages
Errors like 'store exists' and 'store not found' used to only contain the store name, now they also have the catalog. Trying to perform an operation on a non-existent catalog now raises an error that clearly indicates the catalog doesn't exist.
Bug19045 - Use . as the fractional seconds separator in times and dateTimes
Previously AllegroGraph used the comma (,) as the separator in its printed representation of XSD times and dateTimes. This is acceptable ISO 8601 notation but not acceptable XSD notation. This bug fixes the problem so that fractional seconds are separated from whole seconds by a period (.).
Bug19055: Fix potential hang when committing
Under certain circumstances, commits could hang indefinitely while waiting for chunk merging to complete, but chunk merging would never commence due to a counter having an invalid value. This problem has been corrected.
Bug19061: UPI->String table race condition fix
Previously, under high load conditions, errors could occur during string table operations. This problem has been fixed.
Rfe7381 - Add a db keyword argument to several API commands Added the db argument to the following commands
- part->string
- part->value
- part->ntriples
- part->long
- part->concise
- part->terse
- print-triple
[Lisp API]
Rfe8765 - Improved Prolog query performance
Many Prolog queries execute more than 30% more quickly.
Rfe8860 - Geospatial functions now accept strings
The geospatial functions in AllegroGraph previously required a subtype object. They now accept strings which name a subtype object. This makes them easier to use. [Lisp API]
Rfe9146 - Allow 'merging' of store metadata on concurrent change.
Before, two sessions both adding a type-mapping and then committing would cause a conflict error. The metadata system is now clever enough to merge such changes.
Rfe9209 - Don't log process termination messages before cleanup.
Both instance processes and their children used to output '... terminated' before they cleaned up, which in some cases led to them hanging or crashing after they claimed to have terminated.
Rfe9212 - Periodically clean up deleted triple data structures
Previously, the data structures which track deleted triples could grow without bounds. Now, they are periodically examined and obsolete portions are purged.
Rfe9291 - Use XSD instead of internal type-codes for /mappings
The /mapping protocol now returns lists of
mapping-kind datatype/predicate XSD-type
rather than
mapping-kind datatype/predicate numeric-type-code
This makes both debugging and general use of the mappings much simpler for clients.
Rfe9294 - Added SNA path functors to bind only the first matching path
Added breadth-first-search-path and bidirectional-search-path functors to correspond with the existing depth-first-search-path functor. These all run SNA path finding algorithms and stop after finding the first matching path.
Rfe9315 - Do early probe to see if file system supports >4GB file offsets
Allegrograph requires that the underlying file systems support offsets longer than 32 bits. Some file systems (i.e. NFSv2) do not support long offsets, which caused file system related failures in varying situations. Allegrograph now probes each of the file systems specified in a configuration to make sure that long file offsets are supported.
Rfe9322 - Changes in encoded date and time UPIs.
Corrected a problem where the fractional seconds stored in an encoded UPI could be lost. The update fixes the stored representation and invalidates existing triple-stores that have any time, date or date-time encoded UPIs.
Rfe9329 - Improved loading of a required shared library.
Previously, AllegroGraph required loading of a particular shared library (librt) that did not always exist by the same name on all Linux distributions. This problem has been corrected.
Rfe9330: Set up selinux security context on shared library during installation
Previously, on systems with SELinux enabled, the agraph server would not start up properly unless the operator issued the following command first:
chcon -t textrel_shlib_t <agraph lib dir>/libacl*.so
Now, this command is automatically performed during RPM package install or when using the install-agraph script that comes with the tarball package.
Rfe9340 - Disallow overlapping catalog directories.
Having, for example, one catalog in /tmp/foo, and another in /tmp/foo/bar, or two catalogs in the exact same directory, is no longer allowed, since this is bound to cause stores to be corrupted.
Rfe9362: Output a log message when deleting a triple-store.
Deleting a store wasn't visible in the log before. It now notes which store was deleted.
Spr36556 - Reduce number of duplicate rdf:type inferences
AllegroGraph's type inference was returning a triple for each possible inference path rather than filtering out the duplicates. I.e., if a rdf:type B
could be inferred two ways, then the reasoner would return two triples. This no longer happens.
Spr 36565 - Standardize the printing of encoded types
AllegroGraph would sometimes print typed literals non-syntactically. This no longer occurs.
Add a Lisp function to list the catalogs on a server.
The function #'catalogs, exported from the db.agraph package, can now be used to find out which catalogs a server provides.
Added ego-group-layers function
ego-group-layers returns a nodes ego-group organized by the depth at which each node was found. [Lisp API]
Documentation: ego-group-layers (node depth generator)
Return a list of lists of the nodes in node
s ego group.
Each element of the result is a list of the nodes discovered at that list's depth.
These are the nodes in the graph that can be reached by following paths of at most length depth
starting at node
. The paths are determined using the generator
to select neighbors of each node in the ego-group. See the description of SNA generators for additional details.
Added support for auto-commit
to the bulk load functions
The commit parameter to load-ntriples
, load-rdf/xml
and load-trix
can be one of
nil
- do not committ
- commit at the end of the load- a number N - commit every N-triples
Changed behavior of breadth-first-path-search when maximum-depth is given.
Previously, when the breadth-first-path-search functions were given a maximum depth, AllegroGraph would return the shortest path or paths found that were no longer than that depth. Now, AllegroGraph returns all paths found that are no longer than maximum depth.
This can be very expensive for graphs with high connectivity. To return only the shortest path, use a maximum depth of nil. [Lisp API].
Since the Prolog SNA functors use the underlying SNA functions, their behavior has also changed.
Change the keyword value used to identify typed literals
Previously, AllegroGraph used :typed-literal to specify a typed literal. To make things more consistent with :literal :literal-language, this has been changed to :literal-typed. [Lisp API]
Define an error type to signal failures to connect to the AG server.
Any operation (opening, creating, deleting, probing) that has to talk to the server will now raise a condition of type db.agraph:could-not-connect-to-server-error if it fails to find a server on the specified port.
Fix mistake in instance-subprocess communication protocol.
This would cause the instance to go into an infinite loop and become unresponsive when one of its children died.
Make delete-duplicates-on-merge setting persistent.
The flag was not properly restored when opening a triple store.
AllegroGraph Server: HTTP Interface
Bug19002 - Fix bug in HTTP protocol namespace caching.
Fix a bug that caused problems with namespaces after deleting and re-creating (over HTTP) a store that had namespaces defined.
Bug19005 - Corrected problem in client lookups of individual datatype mappings.
Previously, a request for an individual datatype mapping would return nil regardless of how the mapping was defined. Now it returns the correct value.
Rfe9131 - Add defaultGraphName parameter to SPARQL query interface
This is used to provide a way to refer to the default graph inside a SPARQL query using a URI.
Rfe8999 - Use a unified format for serializing blank nodes.
Before, each component had its own format. This moves the HTTP server to the unified format.
Rfe9183 - Silence AServe's socket-reset warnings
The log is no longer polluted with spurious connection-reset messages.
Rfe9185 - Support delete-duplicates-during-merge remotely
Remote-triple-stores now support the :delete-duplicates-during-merge initarg, and the db-delete-duplicates-during-merge accessor.
Rfe9217 - Add more help text to the WebView interface.
Potentially confusing input fields in WebView (such as those where resources or literals are entered) now have a question-mark icon that pops up an explanation.
Rfe9236 - Support SPARQL/Update over HTTP.
POST requests to a repository can now issue a SPARQL/Update query, if the user has write access to the store.
Rfe9247 - Expose server processes in HTTP interface
It is now possible to list, inspect, and kill the server's processes through the HTTP interface. Additionally, one can start up telnet servers in the processes, in order to directly debug them.
Rfe9247(2) - Allow process control from WebView
Superusers can now view the server processes, and open telnet servers in them, through the WebView UI. (See the 'Processes' link in the navigation bar at the top-level page.)
Rfe9295 - Add a commit argument to PUT/POST /statements.
This supports auto-committing every X triples, to make uploading huge files viable.
Rfe9310 - The PUT /repository/[name] now overwrites by default.
The PUT /repository/[name] service in the HTTP protocol now overwrites existing repositories by that name unless override=false is passed.
Allow nested lists in JSON-shaped Prolog-query output.
application/json results for Prolog queries can now contain nested arrays of results, to make it easier to use some of our SNA functionality.
Allow the creation of sessions with federated, graph-filtered, or reasoning stores.
It is now possible to spawn HTTP sessions on a store built up by applying an arbitrary combination reasoning, federation, and graph-filtering to concrete triples.
Cause output of --http-trace switch to actually get flushed to the file.
Before, it was buffered, and would only be visible after a large amount was written, or the server was shut down.
Clean up namespaces when a repository is deleted.
Before, deleting and re-creating a repository would leave the old namespaces intact.
Enable streaming of XML bodies when loading statements.
A problem in our SAX parser made this difficult before, but that has since been fixed. This makes it possible to load large files with less overhead.
Fix a bug that prevented session permissions to show up in WebView user-contol UI.
(No further text.)
Fix a bug with namespace caching.
Sometimes, back-ends wouldn't notice a user's namespaces had changed when they were updated twice within the same second.
Fix bug in remote geospatial queries.
There was a problem that broke remote-triple-stores when more than one geospatial subtype occurred in a query response. This is now fixed.
Fix user-access bug in HTTP protocol.
Fix a bug that prevented users with access to a store in the root catalog, but not the whole catalog, from accessing this store over HTTP.
Make user-access-configuration interface in WebView more obvious
It was easy to mistake the widget for adding access for an indication that access had already been granted.
Support changing user passwords through the HTTP interface
There is now /users/[NAME]/password, to which one can POST a new password.
Support resetting namespaces to the defaults.
There is now a reset argument to the namespace clearing interface, which, when true, will cause the default namespaces to be restored.
Documentation
Rfe9322 - Added client tutorials to the documentation
The documentation now contains tutorials on how to use the Python and Java clients. Please see the links off of the main web pages.
Server Installation Page Updated to be Ubuntu-Friendly
The server-installation instructions have been updated to make them more friendly to people who don't know how to install Python, python-pycurl, and python-cjson packages. This included some specific suggestions for Ubuntu users for RPM installation and Python client installation.
Java Client
Rfe8812 - Jena compliance improvements
Added compliance tests for Jena GraphMaker and Model interfaces and corrected several non-compliances.
Properly handle the case of getting a namespace uri for a prefix that does not exist, and add support for expanding a QName.
Add GraphMaker support for creating anonymous graphs, listing graphs, checking existence of a graph, and returning existing graph objects rather than creating new ones when possible.
Throw AlreadyExistsException when attempting to create an existing graph in strict mode, and throw a DoesNotExist exception when attempting to remove or open a non-existent graph in strict mode.
Rfe8812 - Jena Graph compliance improvements
Added compliance tests for the Jena Graph interface and corrected several non-compliances.
AGBulkUpdateHandler's removeAll method properly clears a graph.
Added a custom AGCapabilities class declaring handlesLiteralTyping to be false, as AGGraph does not currently support D-entailment.
Add support for the executeInTransaction method in AGTransactionHandler, though using begin/commit/rollback explicitly is preferred.
AGTripleIterators are now tied to AGGraphs, calling remove on the iterator will delete from the graph.
Rfe9359 - Jena Prolog support.
Added support for Prolog select queries over a Jena model. Queries must be written to select for variables that are bound to RDF parts, rather than to arbitrary Lisp objects.
Expanded Coverage of Literal Values in Java Tutorial.
The Java tutorial has been revised and expanded to document the behavior of literal typed values (in example5()). This includes the default behavior of untyped literals, as well as their use in getStatements queries as well as filtered and direct SPARQL queries. The section covers strings, floats, ints, untyped values, dates and times.
Java Tutorial Describes Managing User Accounts
The Java tutorial has a new section on "Creating AllegroGraph Users with WebView." It describes how to create a manage new AllegroGraph user accounts.
Python Client
Bug19001 - Changes to RepositoryConnection.clearNamespaces API
RepositoryConnect.clearNamespaces now supports a reset parameter (which defaults to True). clearNamespaces deletes all namespaces in the repository for the current user. If the reset
argument is True
, the user's namespaces are reset to the default set of namespaces, otherwise all namespaces are cleared.
Bug19004 - Make sure Python clients ping their session.
Previously, the thread to keep a dedicated session alive was created but not started, making it possible for Python clients to have their session time out on them.
Rfe9295 - Changes addStatements and addFile
addStatements and addFile can now auto-commit triples based on a count for large loads of triples. Set the add commit size to a positive integer by calling setAddCommitSize or setting the addcommitsize property on the RepositoryConnection for size-based commits, or set to None or 0 for regular commit semantic on the add methods.
Using this feature can decrease the server's memory usage during loading.
Rfe9316 - RepositoryConnection.setRuleLanguage removed
RepositoryConnection.setRuleLanguage has been removed. Both addRules and loadRules already take an language parameter which defaults to QueryLanguage.PROLOG as the only language currently supported for rules.
Removed Common Logic
Removed the experimental and undocumented client side Common Logic query language. Please contact [email protected] for information if you were using this language.
Removed JDBC result sets
Removed the ability to get JDBC result sets. Please contact [email protected] if you need ways to get the result data without instantiating the corresponding Literal, URI, or BNode objects.
Namespace implementation is now server-side
Removed the client-side namespace implementation in favor of a server-side namespace management.
"Duplicate Triples" added to Python Tutorial.
The Python Tutorial has a new section on "Duplicate Triples", based on example23(). It describes the sources of duplicate triples and repetitive query results, and prescribes techniques for bringing them under control.
Python Tutorial Describes Managing User Accounts.
The Python tutorial has a new section on "Creating AllegroGraph Users with WebView." It describes how to create a manage new AllegroGraph user accounts.
Expanded Coverage of Literal Values in Python Tutorial.
The Python tutorial has been revised and expanded to document the behavior of literal typed values (in example5()). This includes the default behavior of untyped literals, as well as their use in getStatements queries as well as filtered and direct SPARQL queries. The section covers strings, floats, ints, untyped values, dates and times.
Python Examples from Command Line
The Python tutorial example file (tutorialexamples40.py) can now be run from the command line.
$ python tutorial_examples_40.py runs all tests.
$ python tutorial_examples_40.py all runs all tests.
$ python tutorial_examples_40.py 1 5 22 runs tests 1, 5, and 22
Changes from m1 to m2
General
The following document outlines the changes in AllegroGraph v4 for the Milestone 2a release.
The M2a release includes the following major functionality:
Full database backups and restores.
Text Indexer Enhancement for databases with more than 1 Billion text elements and strings with international characters.
Infiniband testing and performance document.
Deleted Triples Data structure improvement that removes the previous hard-coded limit on the number of triples that can be deleted per transaction.
Cluster Manager Design Document.
Jena.
Federation of AllegroGraph 4 databases
Support for Social Network Analysis
Support for two dimensional geospatial analysis.
Support for 48-bit triple IDs.
User definable placement of the transaction log.
There were many minor enhancements and bug fixes.
Note that AllegroGraph databases created with release m1 or m2 will have to be converted to a new format before use with version m2a. See the Upgrade Guide.
Performance and Robustness
Improve logging by making some messages more concise and adding additional messages to improve diagnostics.
Improve and make string tables more robust
better consistency checks
compress large strings (>= 256 chars) in the string table
Added conversion tools for moving m1 triple-stores to m2
Support SIGUSR1-based reload in agraph-control script
After merging UPI tables, schedule a checkpoint to delete files which are no longer needed.
Safer file deletion for UPI tables to prevent problems with untunded databases.
Optimize UPI table merging, do not merge table portions that can't have common entries.
Improve client handling of instance process death
Improve upgrade checkpoint handling
Improved database versioning framework
Reference count indexes so that they do not disappear out from under cursors
Rmprove transaction log upgrade handling
Writes to transaction log files are now aligned to the filesystem block size to allow for optimal performance when used with certain storage subsystems.
Control placement of transaction log (with TransactionLogDir)
The maximum number of triples that can be added to a triple store is now 2^48 (formerly 2^32).
Added TransactionLogDir config file parameter. This specifies where transaction log files should go. If not specified, defaults to the value of Main.
The agraph server process now responds to SIGUSR1 by reloading the server configuration file. The agraph-control script and /etc/init.d/agraph scripts now accept the 'reload' command and will send SIGUSR1 to the agraph server process.
Enhanced in-the-field debugging support. When requested by Franz tech support, you can send SIGUSR2 an AllegroGraph subprocess to get a backtrace for all threads.
Other Enhancements
configure-agraph now turns off terminal echo before prompting for the super user password. Also, since the password is no longer echoed, configure-agraph prompts for the password twice to make sure it has been entered properly.
support fractional seconds in time and dateTime
improve support for duplicate deletion during merge operations
improve and unify the printing of blank nodes
make the opening and re-opening of federated-triple-stores more robust
document group centrality
add API to auto-commit (on a per store basis)
add depth-first-search-path functor
add :plain encoding
improve handling of polygons in the geospatial sub-system
cleanup part printing
Remove line breaks in server log files to make automated processing and filtering easier.
Crash instance process if one of its helper processes crashes.
Detect death of instance process in client. Generate clear error message instead of hanging the client.
Add database versioning framework. Implement stepwise upgrade paths for databases that have been created with different software versions.
Fixed bugs and other corrections
Fix for bug18831: Agraph.cfg "overwrite" option appends to previous file instead of overwriting it as it should.
Fixed bug18774: Running "agraph-control start" twice would overwrite a valid pid file. This has been fixed.
SPARQL
Significant SPARQL performance improvements throughout, particularly in Left Joins (OPTIONAL), DISTINCT and REDUCED, and certain numeric FILTERs.
Massive performance increase for queries with both ORDER BY and a small LIMIT.
Imposed bindings (:with-variables) now apply to the output portion of SPARQL CONSTRUCT and DESCRIBE queries.
Some queries that use the SPOGI cache as an optimization could return duplicates. Fixed.
Similarly, the free-text index now returns distinct results in all cases, which avoids some unnecessary duplicate results.
Add the ability to use a graph URI, on both input and output, as a signifier for the default graph UPI of the current store.
Complete xsd:string/simple literal entailment. This allows you to seamlessly switch between using xsd:strings and plain literals without query changes. Plain literals are more efficient for the storage layer, and encouraged in all circumstances.
Improvements (in both efficiency and correctness) for SPARQL comparisons and XQuery functions when handling encoded values.
RDF/XML serializer can now handle on-the-fly namespace abbreviation, at the cost of more verbose output.
SPARQL extensions (e.g., geospatial forms) are now permitted by default.
URIs can now be used to specify geospatial subtypes, as well as the more verbose and less portable UUID form.
Java Client
Added Social Network Analysis capability.
Updated the Java Tutorial and tutorial examples to include Social Network Analysis.
Added Geospatial analysis capability.
Updated the Java Tutorial and tutorial examples to include Geospatial Analysis.
Established a Jena semantic framework API.
Created the Jean Tutorial, and a file of Jena tutorial examples.
Python Client
Added back support for SPOGI cache enabling/disabling
Enabled returning SPARQL-results+json or SPARQL-results+xml result from a query rather than a binding set
Introduced a boolean argument to openSession and session API for whether or not to load the initfile in the process handling the session
Federation support.
Introduced several options and better command-line option parsing for the load stress script
Minor updates to Python Tutorial and examples.