AllegroGraph Triple Indices
AllegroGraph uses a set of sorted indices to quickly identify a contiguous block of triples that are likely to match a specific query pattern.
These indices are identified by names that describe their organization. The default set of indices are called spogi, posgi, ospgi, gspoi, gposi, gospi, and i, where:
- S stands for the subject URI.
- P stands for the predicate URI.
- O stands for the object URI or literal.
- G stands for the graph URI.
- I stands for the triple identifier (its unique id number within the triple store).
The order of the letters denotes how the index has been organized. For instance, the spogi index contains all of the triples in the store, sorted first by subject, then by predicate, then by object, and finally by graph. The triple id number is present as a fifth column in the index.
When a SPARQL or Prolog triple pattern begins with a known value for the subject, the spogi index lets us find all of the triples that have that subject value as a block of entries in the index. This block is further sorted by predicate, so if the predicate value is known we can immediately narrow the potential matches to a small range of triples. In fact, it is often just one triple.
The spogi index provides extremely rapid lookup for any pattern where s is known; or s and p are known; or s, p and o are known; or s, p, o and g are known.
To find unknown subjects that have a specific predicate and object value, AllegroGraph uses the posgi index. All triples that have the same predicate are located together in that index, and are then sorted by object value. This makes it very easy to locate the correct predicate/object triples, or even ranges of such triples.
If only the object value is known, we can use the ospgi index. It is organized by object, and from it we can rapidly retrieve the corresponding subject and predicate.
The graph indices, gspoi, gposi, and gospi, are used when the triple store is divided into subgraphs. If we know the subgraph, we can immediately isolate all of its triples and then index by subject, predicate or object as needed.
The i index is special. It is simply a list of all triples organized by id number. This is useful in triple stores that use reification, but its primary mission is to make triple deletion fast when triples are deleted by id. The id alone is sufficient to identify the subject, predicate, object and graph values that let AllegroGraph delete the same triple from the other indices. Without the i index, AllegroGraph has to scan the other indices line-by-line to find the matching id numbers. This is very slow.
Triples can also be deleted by pattern-matching instead of id. That type of deletion is not influenced by the i index.
The standard seven indices are enabled when you create a triple store. You can customize this set, however, both by eliminating indices that your application will not use, and by requesting custom indices that match your more unusual triple patterns.
For instance, if your application does not use subgraphs, the indices beginning with "g" will never be used. You can speed up indexing dramatically by eliminating these indices from the system.
As a second example, if you have a pattern that asks what graphs contain resources that are blue, you might request a pogsi index which the system does not normally provide. If you know that the predicate is "color" and the object is "blue," then the g column of the pogsi index will contain a block of graph URIs that can be returned in one operation to match this triple pattern.
What if you delete a triple index, thinking it unnecessary, and then AllegroGraph encounters a query that would optimally use that index? In this case, AllegroGraph automatically uses the next-best index. AllegroGraph can use any index to find the required matches. Note, however, that doing so may require full scans of the triple-store and this can be painfully slow.
Note that you can delete any combination of indices from the triple store, but AllegroGraph will not let you delete all of the indices. There must always be at least one index.
As triples are committed to an AllegroGraph database, they are indexed for fast retrieval. As the indices grow, they become less efficient. Left unchecked, this degradation in efficiency would quickly result in unsatisfactory query performance.
To overcome this, AllegroGraph Server automatically initiates background index optimization operations when index efficiency drops below internal thresholds. An index optimization involves rewriting a portion of an index. Background index optimization operations are repeatedly performed until index efficiency rises above the internal threshold.
Removing duplicate triples also allows improved index performance. Duplicate triples are removed when you call delete-duplicate-triples followed by a commit. Duplicate triples removal is not done automatically or in the background.
An index can be used for queries even while it is being optimized.
Forced index optimization
In order to balance query performance with the cost of doing the operations, background index optimization does not necessarily bring indices to maximum efficiency. However, through the various client APIs, a user may request more aggressive index optimization.
For example, this is done in the Lisp Client API with the function optimize-indices:
(optimize-indices &key level wait-p (db *db*))
See the function documentation for details, but in short, if wait-p is true, then the function will not return until the requested operation has completed; level specifies how aggressive the optimization will be (more aggression, more optimization, but takes more time). The value must be a positive integer or nil (nil means the same as not specifying a value -- use the default).
You can use AGWebView to add or remove triple indices in a repository (also see add-index and drop-index is the Lisp Client API). There is nothing difficult about adding or removing indices, so feel free to experiment. (If the repository contains a large amount of data there may be a delay while AllegroGraph builds the new index.)
You can also ask for a specific set of indices when you create a repository through agload, the Lisp Client, the Java Sesame client API, or the Python client API.