Basic query analysis
Overview
AllegroGraph 7.0.0 implements a simple static query analyzer that reports which indices a query will use and a dynamic query analyzer which processes the query and determines which indices are actually used. While the dynamic analyzer provides much better information it also takes much longer since the query is processed during the analysis.
The analyzer is turned on by adding an additional parameter to the HTTP query string: analyzeIndicesUsed=true
. The type of analysis is determined by the queryAnalysisTechnique
parameter, which can have values static
, for static analysis, and executed
for dynamic analysis. There is also a parameter queryAnalysisTimeout
which can be used to specific a max number of seconds for the analysis, which is abandoned if time taken exceeds that mount (default when unspecified is no limit). The timeout only applies to dynamic analysis.
For example, when I encode this query into an HTTP request:
select ?s ?p ?o
where {
?s ex:profession ex:ballPlayer .
?s ?p ?o
}
it looks like (where line breaks have been added to make the request (somewhat) readable):
/repositories/temp?query=
select%20%3Fs%20%3Fp%20%3Fo%0Awhere%20%7B%0A%20%20%3Fs
%20ex%3Aprofression%ex%3AballPlayer%20.%0A%20%20%3Fs%20
%3Fp%20%3Fo%0A%7D%0A%0A&queryLn=SPARQL&limit=500
&infer=false
If I want to analyze the query instead of running it, I add &analyzeIndicesUsed=true
and &queryAnalysisTechnique=static
and get
/repositories/temp?query=
select%20%3Fs%20%3Fp%20%3Fo%0Awhere%20%7B%0A%20%20%3Fs
%20ex%3Aprofression%ex%3AballPlayer%20.%0A%20%20%3Fs%20
%3Fp%20%3Fo%0A%7D%0A%0A&queryLn=SPARQL&limit=500
&infer=false
&analyzeIndicesUsed=true
&queryAnalysisTechnique=static
The results of query analysis are returned in the HTTP response as plain text and also added to the server log. In both cases, the results look like
(desired spogi optimal 1 actual spogi optimal 1)
(desired posgi optimal 6 actual ospgi suboptimal 4)
or, in the log file,
[2010-09-17T12:48:04 p4426 http] query analysis of "prefix owl: <http://www.w3.org/2002/07/owl#>\nselect *\nwhere { ?s a owl:Restriction .\n ?s ?p ?v . }" gives ((:desired (:spogi :optimal 1) :actual (:spogi :optimal 1)) (:desired (:posgi :optimal 6) :actual (:posgi :optimal 6)))
[2010-09-17T12:48:21 p4426 http] query analysis of "prefix owl: <http://www.w3.org/2002/07/owl#>\nselect *\nwhere { ?s a owl:Restriction .\n ?s ?p ?v . }" gives ((:desired (:spogi :optimal 1) :actual (:spogi :optimal 1)) (:desired (:posgi :optimal 6) :actual (:ospgi :suboptimal 4))
Where each line refers to one of the indices required by the query. The line indicates which index flavor the query really wants and which index the query used based on what is available to the store. So the line
(desired posgi optimal 6 actual ospgi suboptimal 4)
means that the query wanted posgi
(which would have been optimal) but it got ospgi
(which was suboptimal). The three possible values are optimal, suboptimal and full (a full scan). The numbers indicate whether or not additional filtering was required and should be ignored for now.
If &queryAnalysisTechnique=executed
was added (instead of &queryAnalysisTechnique=static
), or if that parameter was unspecified, the query would be processed and the actual indices looked for and used would be reported.
Lisp API
The Lisp function db.agraph:analyze-query-index-usage will also do an analysis.
Caveats
The current static analyzer works only with SPARQL (a Prolog version is underway).
There is no AGWebView interface at this time.
The static analyzer uses the SPARQL planner but does not know about some of the tricks used by the SPARQL executor. In particular, when SPARQL sees a query like
SELECT * WHERE { ?s ex:age ?o FILTER ( ?o > 15 && 25 > ?o ) }
it will know that the FILTER can be brought into the triple pattern retrieval directory which converts the query from
get-triples :p ex:age
into:
get-triples :p ex:age :o 15 :o-end 25
The analyzer will see the first get-triples and and think that the query would be best served by POSGI. In execution, however, (and depending on the actual contents of the store,) the best index might well be OSPGI because it would allow for very quick filtering on the range.
Future work
We will be continuing to extend the static plan analyzer so that it works correctly with the examples shown in above.
Feedback and ideas are very welcome as we continue to enhance AllegroGraph's query engine.