Basic query analysis

Overview

AllegroGraph 6.0 implements a simple static query analyzer that reports which indices a query will use and a dynamic query analyzer which processes the query and determines which indices are actually used. While the dynamic analyzer provides much better information it also takes much longer since the query is processed during the analysis.

The analyzer is turned on by adding an additional parameter to the HTTP query string: analyzeIndicesUsed=true. The type of analysis is determined by the queryAnalysisTechnique parameter, which can have values static, for static analysis, and executed for dynamic analysis. There is also a parameter queryAnalysisTimeout which can be used to specific a max number of seconds for the analysis, which is abandoned if time taken exceeds that mount (default when unspecified is no limit). The timeout only applies to dynamic analysis.

For example, when I encode this query into an HTTP request:

select ?s ?p ?o  
where {  
  ?s ex:profession ex:ballPlayer .  
  ?s ?p ?o  
} 

it looks like (where line breaks have been added to make the request (somewhat) readable):

/repositories/temp?query=  
select%20%3Fs%20%3Fp%20%3Fo%0Awhere%20%7B%0A%20%20%3Fs  
%20ex%3Aprofression%ex%3AballPlayer%20.%0A%20%20%3Fs%20  
%3Fp%20%3Fo%0A%7D%0A%0A&queryLn=SPARQL&limit=500  
&infer=false 

If I want to analyze the query instead of running it, I add &analyzeIndicesUsed=true and &queryAnalysisTechnique=static and get

/repositories/temp?query=  
select%20%3Fs%20%3Fp%20%3Fo%0Awhere%20%7B%0A%20%20%3Fs  
%20ex%3Aprofression%ex%3AballPlayer%20.%0A%20%20%3Fs%20  
%3Fp%20%3Fo%0A%7D%0A%0A&queryLn=SPARQL&limit=500  
&infer=false  
&analyzeIndicesUsed=true  
&queryAnalysisTechnique=static 

The results of query analysis are returned in the HTTP response as plain text and also added to the server log. In both cases, the results look like

(desired spogi optimal 1 actual spogi optimal 1)  
(desired posgi optimal 6 actual ospgi suboptimal 4) 

or, in the log file,

[2010-09-17T12:48:04 p4426 http] query analysis of "prefix owl: <http://www.w3.org/2002/07/owl#>\nselect *\nwhere { ?s a owl:Restriction .\n ?s ?p ?v . }" gives ((:desired (:spogi :optimal 1) :actual (:spogi :optimal 1)) (:desired (:posgi :optimal 6) :actual (:posgi :optimal 6)))  
[2010-09-17T12:48:21 p4426 http] query analysis of "prefix owl: <http://www.w3.org/2002/07/owl#>\nselect *\nwhere { ?s a owl:Restriction .\n ?s ?p ?v . }" gives ((:desired (:spogi :optimal 1) :actual (:spogi :optimal 1)) (:desired (:posgi :optimal 6) :actual (:ospgi :suboptimal 4)) 

Where each line refers to one of the indices required by the query. The line indicates which index flavor the query really wants and which index the query used based on what is available to the store. So the line

(desired posgi optimal 6 actual ospgi suboptimal 4) 

means that the query wanted posgi (which would have been optimal) but it got ospgi (which was suboptimal). The three possible values are optimal, suboptimal and full (a full scan). The numbers indicate whether or not additional filtering was required and should be ignored for now.

If &queryAnalysisTechnique=executed was added (instead of &queryAnalysisTechnique=static), or if that parameter was unspecified, the query would be processed and the actual indices looked for and used would be reported.

Lisp API

The Lisp function db.agraph:analyze-query-index-usage will also do an analysis.

Caveats

Future work

We will be continuing to extend the static plan analyzer so that it works correctly with the examples shown in above.

Feedback and ideas are very welcome as we continue to enhance AllegroGraph's query engine.