AllegroGraph Triple Indices
AllegroGraph uses a set of sorted indices to quickly identify a contiguous block of triples that are likely to match a specific query pattern.
These indices are identified by names that describe their organization. The default set of indices are called spogi, posgi, ospgi, gspoi, gposi, gospi, and i, where:
- S stands for the subject URI.
- P stands for the predicate URI.
- O stands for the object URI or literal.
- G stands for the graph URI.
- I stands for the triple identifier (its unique id number within the triple store).
SPOGI Index
The order of the letters denotes how the index has been organized. For instance, the spogi index contains all of the triples in the store, sorted first by subject, then by predicate, then by object, and finally by graph. The triple id number is present as a fifth column in the index.
When a SPARQL or Prolog triple pattern begins with a known value for the subject, the spogi index lets us find all of the triples that have that subject value as a block of entries in the index. This block is further sorted by predicate, so if the predicate value is known we can immediately narrow the potential matches to a small range of triples. In fact, it is often just one triple.
The spogi index provides extremely rapid lookup for any pattern where s is known; or s and p are known; or s, p and o are known; or s, p, o and g are known.
POSGI Index
To find unknown subjects that have a specific predicate and object value, AllegroGraph uses the posgi index. All triples that have the same predicate are located together in that index, and are then sorted by object value. This makes it very easy to locate the correct predicate/object triples, or even ranges of such triples.
OSPGI Index
If only the object value is known, we can use the ospgi index. It is organized by object, and from it we can rapidly retrieve the corresponding subject and predicate.
Graph Indices
The graph indices, gspoi, gposi, and gospi, are used when the triple store is divided into subgraphs. If we know the subgraph, we can immediately isolate all of its triples and then index by subject, predicate or object as needed.
I Index
The i index is special. It is simply a list of all triples organized by id number. This is useful in triple stores that use reification, but its primary mission is to make triple deletion fast when triples are deleted by id. The id alone is sufficient to identify the subject, predicate, object and graph values that let AllegroGraph delete the same triple from the other indices. Without the i index, AllegroGraph has to scan the other indices line-by-line to find the matching id numbers. This is very slow.
Triples can also be deleted by pattern-matching instead of id. That type of deletion is not influenced by the i index.
Index styles
Indices can be created in two different styles. The styles affect the internal organization of the index. The styles are called style 1 and style 2. (Style 2 is new in release 6.0. All indices in earlier releases were style 1 indices, although the style was not named.)
From the point of view of the user, here are the differences:
Style 1 indices are faster to construct and optimize. They are better for queries which end up scanning many triples and which typically produce results with many triples.
Style 2 indices take longer to construct and optimize. They are better for point queries where all of s/p/o or s/p/o/g are fixed and where the result is typically a single triple.
You can specify which indices should be style 2 with the Style2Indices
catalog configuration directive, described here in the Server Configuration and Control document. That directive says which indices should be style 2. All other indices will be style 1.
However, you can also specify the style when creating an index in the Lisp API using add-index or using the REST interface with the PUT /catalogs/[NAME]/repositories/[name]/indices/[type]
request, such as:
PUT /catalogs/MYCAT/repositories/MYREPO/indices/spogi?style=1
In both cases, the style can be specified as 0 (meaning style 2 if the Style2Indices
catalog configuration directive says the index being created should be of style 2 and style 1 otherwise -- in other words, use the default for that type of index); or specified as 1 or 2, meaning make the index use that style regardless of the configuration option.
You must commit the triple store for the new index to be created.
Customizing Indices
The standard seven indices are enabled when you create a triple store. You can customize this set, however, both by eliminating indices that your application will not use, and by requesting custom indices that match your more unusual triple patterns.
For instance, if your application does not use subgraphs, the indices beginning with "g" will never be used. You can speed up indexing dramatically by eliminating these indices from the system.
As a second example, if you have a pattern that asks what graphs contain resources that are blue, you might request a pogsi index which the system does not normally provide. If you know that the predicate is "color" and the object is "blue," then the g column of the pogsi index will contain a block of graph URIs that can be returned in one operation to match this triple pattern.
Index Substitution
What if you delete a triple index, thinking it unnecessary, and then AllegroGraph encounters a query that would optimally use that index? In this case, AllegroGraph automatically uses the next-best index. AllegroGraph can use any index to find the required matches. Note, however, that doing so may require full scans of the triple-store and this can be painfully slow.
Note that you can delete any combination of indices from the triple store, but AllegroGraph will not let you delete all of the indices. There must always be at least one index.
Index Replacement
You may wish to replace an index of one style (say style 1) with a similar index of the other style (say style 2). There is no other reason to replace an index. Index styles are discussed in the index styles section.
You can replace an index by calling for its creation with a different style.
Using the Lisp API (see Lisp Reference) you would evaluate the following forms:
(add-index :posgi :style [desired value])
(commit-triple-store)
The functions called are add-index and commit-triple-store.
Using the REST API (see HTTP reference), if the current spogi index is a Style 2 index, the following would change it to Style 1 (we assume the catalog is MYCAT and the repository is MYREPO):
PUT /catalogs/MYCAT/repositories/MYREPO/indices/spogi?style=1
PUT /catalogs/MYCAT/repositories/MYREPO/commit
The system will remove the current index of the particular type and style and replace it with a rebuilt index of the same type and the new style.
Optimizing indices
As triples are committed to an AllegroGraph database, they are indexed for fast retrieval. As the indices grow, they become less efficient. Left unchecked, this degradation in efficiency would quickly result in unsatisfactory query performance.
To overcome this, AllegroGraph Server automatically initiates background index optimization operations when index efficiency drops below internal thresholds. An index optimization involves rewriting a portion of an index. Background index optimization operations are repeatedly performed until index efficiency rises above the internal threshold.
Removing duplicate triples also allows improved index performance. Duplicate triples are removed when you call delete-duplicate-triples followed by a commit. Duplicate triples removal is not done automatically or in the background.
An index can be used for queries even while it is being optimized.
Forced index optimization
In order to balance query performance with the cost of doing the operations, background index optimization does not necessarily bring indices to maximum efficiency. However, through the various client APIs, a user may request more aggressive index optimization.
For example, this is done in the Lisp Client API with the function optimize-indices:
(optimize-indices &key level wait-p (db *db*))
See the function documentation for details, but in short, if wait-p is true, then the function will not return until the requested operation has completed; level specifies how aggressive the optimization will be (more aggression, more optimization, but takes more time). The value must be a positive integer or nil (nil means the same as not specifying a value -- use the default).
Managing Indices
You can use AGWebView to add or remove triple indices in a repository (also see add-index and drop-index is the Lisp Client API). There is nothing difficult about adding or removing indices, so feel free to experiment. (If the repository contains a large amount of data there may be a delay while AllegroGraph builds the new index.)
You can also ask for a specific set of indices when you create a repository through agload, the Lisp Client, the Java Sesame client API, or the Python client API.