The Purpose of Keyword Syntax
Keyword syntax offers an elegant way to write queries with Allegrograph Magic Predicates, when the predicate has a large and unwieldy number of variables, while maintaining compatibility with standard SPARQL syntax.
Currently only three LLM magic predicates accept keyword syntax: llm:nearestNeighbor, llm:askMyDocuments and llm:chatState.
For example, the only syntax provided by the magic predicate llm:nearestNeigbor was
(?uri ?score ?originalText ?pred ?type) llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector)
In this form, the variables are called positional because the values must appear in the specified position, e.g. ?uri
in the first position, ?score
in the second, and so on. Any optional variables may be omitted, but processing any one of them requires specifying all the variables to the left of it.
We know from the (documentation)[nearestNeighbor] that only two of the input variables (on the right) and one of the output variables (on the left) are required and the rest are optional. The minimal form for this predicate is:
?uri llm:nearestNeighbor (?text ?vectorDatabase)
This form is fine if we only care about the required variables. However suppose we only care about the setting the optional input ?selector
and the reading optional output ?originalText
. The positional syntax required a query to use an expression like
(?uri ?score ?originalText) llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector)
where all the positional arguments, up to the ones we care about, must be specified in order, even if we are willing to leave those values (?score
in output and ?topN
and ?minScore
in input) unspecified.
The new keyword syntax allows us to write
(?uri :originalText ?originalText) llm:nearestNeighbor (?text :selector ?selector :vectorRepoSpec ?vectorRepoSpec)
thus ignoring the output variable ?score
and accepting the default values of ?topN
and ?minScore
. Note also that the keyword syntax permits the arguments to appear in any order.
How Keyword Syntax Works
When Allegrograph evaluates a magic predicate within a SPARQL query, the input variables are evaluated and the output variables are bound
Whether bound or evaluated, the keyword-enabled magic predicates process variables in the same order: from left to right, treating each as a positional variable until a keyword is found. Once the left-to-right process finds a keyword, the remaining variables are expected to be given in keyword syntax. Thus, to continue with the nearestNeighbor example, in the expression
(?uri :originalText ?originalText) llm:nearestNeighbor (?text :vectorRepoSpec ?vectorRepoSpec :selector ?selector)
the process detects ?uri
and ?text
as positional syntax, and ?originalText
, ?vectorRepoSpec
and ?selector
as keyword syntax.
However the following form is invalid:
(?uri :originalText ?originalText) llm:nearestNeighbor (?text :vectorRepoSpec ?vectorRepoSpec ?topN ?minScore ?selector)
because after the process identifies ?vectorRepoSpec
as keyword syntax, it expects the remaining variables on the right be to in keyword syntax.
A Note on Namespaces
One detail we've glossed over so far, is the way SPARQL interprets a keyword like :key. In order to make the token :key compatible with SPARQL syntax, we have to bind the empty namespace : to a special keyword namespace, as in the SPARQL statement
PREFIX : <http://franz.com/ns/keyword#>
In Allegrograph, the default value of the empty namespace is <http://franz.com/ns/keyword#>
, so in most cases a keyword can be abbreviated like :kw
. But if the environment binds the empty namespace to something else (which is actually not uncommon) then users wanting to use the keyword syntax should bind a different prefix, such as key
, as follows:
PREFIX kw: <http://franz.com/ns/keyword#>
and then you can write the example above as:
(?uri kw:text ?originalText) llm:nearestNeighbor (?text kw:vectorRepoSpec ?vectorRepoSpec kw:selector ?selector)
Examples Using Keyword Syntax
Some examples may help explain the keyword syntax.
In these examples, assume we have a vector store historicalVec
containing an indexed embeddings of names of famous historical figures.
Suppose we are interested in the matching original text, but not the score:
SELECT * {
(?uri :originalText ?originalText) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec" 10 0.0)
}
Here we set the minimum matching score, but accept the default value for the top N matches:
SELECT * {
(?uri :originalText ?originalText) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec" :minScore 0.8)
}
In this case we specify all the optional input variables using keyword syntax:
SELECT * {
(?uri ?score ?originalText) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec" :topN 10 :minScore 0.0 :selector "{?id rdf:type <http://franz.com/vdb/gen/Object>}")
}
It is even possible to place required positional arguments out-of-order using keyword syntax, as in this example where we reverse the order of the output arguments:
SELECT * {
(:originalText ?originalText :score ?score :uri ?uri) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec")
}
For completeness, this last example illustrates using keyword syntax with every variable:
PREFIX llm: <http://franz.com/ns/allegrograph/8.0.0/llm/>
SELECT * {
(:uri ?uri :score ?score :originalText ?text :pred ?pred :type ?type)
llm:nearestNeighbor
(:text "Abraham Lincoln" :vectorRepoSpec "historicalVec" :topN 10 :minScore 0.8 :selector "{?id rdf:type <http://franz.com/vdb/gen/Object>.}")
}
Predicates Accepting Keyword Syntax
In addition to llm:nearestNeighbor, two other predicates accept keyword syntax: llm:askMyDocuments and llm:chatState.
The following table summarizes all the keyword argumennts for all three predicates.
Variable Role | nearestNeighbor Keyword | Optional Variable? | nearestNeighbor Variable Description | askMyDocuments Keyword | Optional Variable? | askMyDocuments Variable Description | chatState Keyword | Optional Variable? | chatState Variable Description |
Output Variables: | :response | No | LLM Response text | :response | No | LLM Response text | |||
:score | Yes | Match score | :score | No | Match score | :score | No | Match score | |
:uri | No | ID of matching text | :citationId | Yes | ID of matching text | :citationId | Yes | ID of matching text | |
:originalText | Yes | Matching text | :citedText | Yes | Matching text | :citedText | Yes | Matching text | |
:pred | Yes | Predicate associated with embedded object | |||||||
:type | Yes | Type of subject associated with embedded object | |||||||
:citationSource | Yes | Source vector store of match | |||||||
:feed | Yes | Recent dialog exchanges | |||||||
:story | Yes | Statement about feed | |||||||
:expertiseMatchString | Yes | Match string for Expertise repo | |||||||
:expertiseMatches | Yes | Matching Expertise embeddings and IDs | |||||||
:expertiseHints | Yes | Statement about matching expertise | |||||||
:historyMatchString | Yes | Match string for History repo | |||||||
:historyMatches | Yes | Matching history embeddings and IDs | |||||||
:historyHints | Yes | Statement about matching history | |||||||
:prompt | Yes | Prompt for LLM | |||||||
Input Variables: | :text | No | Input text string | :text | No | Input text string | :text | No | Input text string |
:vectorRepoSpec | No | Vector store Repo spec | :vectorRepoSpec | No | Vector store repo spec | :expertiseRepoSpec | No | Expertise vector repo spec | |
:topN | Yes | Maximum number of matches | :topN | Yes | Maximum number of matches | :expertiseTopN | Yes | Maximum number of Expertise matches | |
:minScore | Yes | Minimum score for matches | :minScore | Yes | Minimum score for matches | :expertiseMinScore | Yes | Minimum score for Expertise matches | |
:selector | Yes | Selects subsets of vector repo | :selector | Yes | Selects subsets of vector repo | ||||
:historyTopN | Yes | Maximum number of History matches | |||||||
:historyMinScore | Yes | Minimum score for History matches | |||||||
:botId | Yes | ID of bot | |||||||
:historyRepoSpec | Yes | History vector repo spec |