Query Analysis | AllegroGraph 7.0.0

Basic query analysis

Overview

AllegroGraph 7.0.0 implements a simple static query analyzer that reports which indices a query will use and a dynamic query analyzer which processes the query and determines which indices are actually used. While the dynamic analyzer provides much better information it also takes much longer since the query is processed during the analysis.

The analyzer is turned on by adding an additional parameter to the HTTP query string: analyzeIndicesUsed=true. The type of analysis is determined by the queryAnalysisTechnique parameter, which can have values static, for static analysis, and executed for dynamic analysis. There is also a parameter queryAnalysisTimeout which can be used to specific a max number of seconds for the analysis, which is abandoned if time taken exceeds that mount (default when unspecified is no limit). The timeout only applies to dynamic analysis.

For example, when I encode this query into an HTTP request:

select ?s ?p ?o  
where {  
  ?s ex:profession ex:ballPlayer .  
  ?s ?p ?o  
}

it looks like (where line breaks have been added to make the request (somewhat) readable):

/repositories/temp?query=  
select%20%3Fs%20%3Fp%20%3Fo%0Awhere%20%7B%0A%20%20%3Fs  
%20ex%3Aprofression%ex%3AballPlayer%20.%0A%20%20%3Fs%20  
%3Fp%20%3Fo%0A%7D%0A%0A&queryLn=SPARQL&limit=500  
&infer=false

If I want to analyze the query instead of running it, I add &analyzeIndicesUsed=true and &queryAnalysisTechnique=static and get

/repositories/temp?query=  
select%20%3Fs%20%3Fp%20%3Fo%0Awhere%20%7B%0A%20%20%3Fs  
%20ex%3Aprofression%ex%3AballPlayer%20.%0A%20%20%3Fs%20  
%3Fp%20%3Fo%0A%7D%0A%0A&queryLn=SPARQL&limit=500  
&infer=false  
&analyzeIndicesUsed=true  
&queryAnalysisTechnique=static

The results of query analysis are returned in the HTTP response as plain text and also added to the server log. In both cases, the results look like

(desired spogi optimal 1 actual spogi optimal 1)  
(desired posgi optimal 6 actual ospgi suboptimal 4)

or, in the log file,

[2010-09-17T12:48:04 p4426 http] query analysis of "prefix owl: <http://www.w3.org/2002/07/owl#>\nselect *\nwhere { ?s a owl:Restriction .\n ?s ?p ?v . }" gives ((:desired (:spogi :optimal 1) :actual (:spogi :optimal 1)) (:desired (:posgi :optimal 6) :actual (:posgi :optimal 6)))  
[2010-09-17T12:48:21 p4426 http] query analysis of "prefix owl: <http://www.w3.org/2002/07/owl#>\nselect *\nwhere { ?s a owl:Restriction .\n ?s ?p ?v . }" gives ((:desired (:spogi :optimal 1) :actual (:spogi :optimal 1)) (:desired (:posgi :optimal 6) :actual (:ospgi :suboptimal 4))

Where each line refers to one of the indices required by the query. The line indicates which index flavor the query really wants and which index the query used based on what is available to the store. So the line

(desired posgi optimal 6 actual ospgi suboptimal 4)

means that the query wanted posgi (which would have been optimal) but it got ospgi (which was suboptimal). The three possible values are optimal, suboptimal and full (a full scan). The numbers indicate whether or not additional filtering was required and should be ignored for now.

If &queryAnalysisTechnique=executed was added (instead of &queryAnalysisTechnique=static), or if that parameter was unspecified, the query would be processed and the actual indices looked for and used would be reported.

Lisp API

The Lisp function db.agraph:analyze-query-index-usage will also do an analysis.

Caveats

The current static analyzer works only with SPARQL (a Prolog version is underway).
There is no AGWebView interface at this time.
The static analyzer uses the SPARQL planner but does not know about some of the tricks used by the SPARQL executor. In particular, when SPARQL sees a query like
```
SELECT *  
WHERE { ?s ex:age ?o  
        FILTER ( ?o > 15 && 25 > ?o )  
} 
```

it will know that the FILTER can be brought into the triple pattern retrieval directory which converts the query from

get-triples :p ex:age

into:

get-triples :p ex:age :o 15 :o-end 25

The analyzer will see the first get-triples and and think that the query would be best served by POSGI. In execution, however, (and depending on the actual contents of the store,) the best index might well be OSPGI because it would allow for very quick filtering on the range.

Similarly, queries with optionals may confuse the static analyzer because it will not know for certain whether or not the variables in the optional clause have become bound.

Future work

We will be continuing to extend the static plan analyzer so that it works correctly with the examples shown in above.

Feedback and ideas are very welcome as we continue to enhance AllegroGraph's query engine.

AllegroGraph 7.0.0 Query Analysis

Basic query analysis

Overview

Lisp API

Caveats

Future work