Keyword Syntax for Magic Predicate

The Purpose of Keyword Syntax

Keyword syntax offers an elegant way to write queries with Allegrograph Magic Predicates, when the predicate has a large and unwieldy number of variables, while maintaining compatibility with standard SPARQL syntax.

Currently these three LLM magic predicates accept keyword syntax: llm:nearestNeighbor, llm:askMyDocuments and llm:chatState.

And the SNA magic property sna:communityLeiden accepts keywords as well.

Here is how keywords work with magic predicates. The syntax provided for the magic predicate llm:nearestNeigbor is

(?uri ?score ?originalText ?pred ?type) llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector ?useClustering)

In this form, the variables are called positional because the values must appear in the specified position, e.g. ?uri in the first position, ?score in the second, and so on. Any optional variables may be omitted, but processing any one of them requires specifying all the variables to the left of it. (Note: the ?selector and the ?useClustering arguments reduce the search space and so speed up the queries. ?selector is documented here. ?userClustering will use clusters within the repo if they have been created. See the description of creating clusters in a repo here.

We know from the documentation, that only two of the input variables (on the right) and one of the output variables (on the left) are required and the rest are optional. The minimal form for this predicate is:

?uri llm:nearestNeighbor (?text ?vectorRepoSpec)

This form is fine if we only care about the required variables. However suppose we only care about the setting the optional input ?selector and the reading optional output ?originalText. The positional syntax required a query to use an expression like

(?uri ?score ?originalText) llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector)

where all the positional arguments, up to the ones we care about, must be specified in order, even if we are willing to leave those values (?score in output and ?topN and ?minScore in input) unspecified.

The new keyword syntax allows us to write

(?uri :originalText ?originalText) llm:nearestNeighbor (?text :selector ?selector :vectorRepoSpec ?vectorRepoSpec)

thus ignoring the output variable ?score and accepting the default values of ?topN and ?minScore. Note also that the keyword syntax permits the arguments to appear in any order.

How Keyword Syntax Works

When Allegrograph evaluates a magic predicate within a SPARQL query, the input variables are evaluated and the output variables are bound

Whether bound or evaluated, the keyword-enabled magic predicates process variables in the same order: from left to right, treating each as a positional variable until a keyword is found. Once the left-to-right process finds a keyword, the remaining variables are expected to be given in keyword syntax. Thus, to continue with the nearestNeighbor example, in the expression

(?uri :originalText ?originalText) llm:nearestNeighbor (?text :vectorRepoSpec ?vectorRepoSpec :selector ?selector)

the process detects ?uri and ?text as positional syntax, and ?originalText, ?vectorRepoSpec and ?selector as keyword syntax.

However the following form is invalid:

(?uri :originalText ?originalText) llm:nearestNeighbor (?text :vectorRepoSpec ?vectorRepoSpec ?topN ?minScore ?selector)

because after the process identifies ?vectorRepoSpec as keyword syntax, it expects the remaining variables on the right be to in keyword syntax.

A Note on Namespaces

One detail we've glossed over so far, is the way SPARQL interprets a keyword like :key. In order to make the token :key compatible with SPARQL syntax, we have to bind the empty namespace : to a special keyword namespace, as in the SPARQL statement

PREFIX : <http://franz.com/ns/keyword#>

In Allegrograph, the default value of the empty namespace is (link http://franz.com/ns/keyword#), so in most cases a keyword can be abbreviated like :kw. But if the environment binds the empty namespace to something else (which is actually not uncommon) then users wanting to use the keyword syntax should bind a different prefix, such as key, as follows:

PREFIX kw: <http://franz.com/ns/keyword#>

and then you can write the example above as:

(?uri kw:text ?originalText) llm:nearestNeighbor (?text kw:vectorRepoSpec ?vectorRepoSpec kw:selector ?selector)

Examples Using Keyword Syntax

Some examples may help explain the keyword syntax.

In these examples, assume we have a vector store historicalVec containing an indexed embeddings of names of famous historical figures.

Suppose we are interested in the matching original text, but not the score:

SELECT * {  
  (?uri :originalText ?originalText) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec" 10 0.0)  
}

Here we set the minimum matching score, but accept the default value for the top N matches:

SELECT * {  
  (?uri :originalText ?originalText) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec" :minScore 0.8)  
}

In this case we specify all the optional input variables using keyword syntax:

SELECT * {  
  (?uri ?score ?originalText) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec" :topN 10 :minScore 0.0 :selector "{?id rdf:type <http://franz.com/vdb/gen/Object>}")  
}

It is even possible to place required positional arguments out-of-order using keyword syntax, as in this example where we reverse the order of the output arguments:

SELECT * {  
  (:originalText ?originalText :score ?score :uri ?uri) llm:nearestNeighbor ("Abraham Lincoln" "historicalVec")  
}

For completeness, this last example illustrates using keyword syntax with every variable:

PREFIX llm: <http://franz.com/ns/allegrograph/8.0.0/llm/>  
SELECT * {  
  (:uri ?uri :score ?score :originalText ?text :pred ?pred :type ?type)  
  llm:nearestNeighbor  
  (:text "Abraham Lincoln" :vectorRepoSpec "historicalVec" :topN 10 :minScore 0.8 :selector "{?id rdf:type <http://franz.com/vdb/gen/Object>.}")  
}

LLM Predicates Accepting Keyword Syntax

The LLM magic predicates and llm:nearestNeighbor, llm:askMyDocuments, and llm:chatState. (sna:communityLeiden also accepts keywords but is not discussed in this section.)

The following tables summarizes all the keyword arguments for the three predicates.

For an explanation of the :selector argument to llm:nearestNeighbor and llm:askMyDocuments, see the Documentation for selector.

For information on creating clusters to use with the :useClustering argument, see this discussion of creating clusters.

First llm:nearestNeighbor:

Variable Role	nearestNeighbor Keyword	Optional Variable?	nearestNeighbor Variable Description
Output Variables:
Output	`:score`	No	Match score
Output	`:uri`	Yes	ID of matching text
Output	`:originalText`	Yes	ID of matching text
Output	`:pred`	Yes	Predicate associated with embedded object
Output	`:type`	Yes	Type of subject associated with embedded object
Input Variables:
Input	`:text`	No	Input text string
Input	`:vectorRepoSpec`	No	Vector store repo spec
Input	`:topN`	Yes	Maximum number of matches
Input	`:minScore`	Yes	Minimum score for matches
Input	`:selector`	Yes	Selects subsets of vector repo
Input	`:useClustering`	Yes	Causes a clustering created by agtool to be used, see above for more information.

Second, keyword arguments to llm:askMyDocuments:

Variable Role	askMyDocuments Keyword	Optional Variable?	askMyDocuments Variable Description
Output Variables:
Output	`:response`	No	LLM Response text
Output	`:score`	No	Match score
Output	`:citationId`	Yes	ID of matching text
Output	`:citedText`	Yes	Matching text
Input Variables:
Input	`:text`	No	Input text string
Input	`:vectorRepoSpec`	No	Vector store repo spec
Input	`:topN`	Yes	Maximum number of matches
Input	`:minScore`	Yes	Minimum score for matches
Input	`:selector`	Yes	Selects subsets of vector repo
Input	`:useClustering`	Yes	Causes a clustering created by agtool to be used, see above for more information.

Finally, keyword arguments to llm:chatState:

Variable Role	chatState Keyword	Optional Variable?	chatState Variable Description
Output Variables:
Output	`:response`	No	LLM Response text
Output	`:score`	No	Match score
Output	`:citationId`	Yes	ID of matching text
Output	`:citedText`	Yes	Matching text
Output	`:citationSource`	Yes	Source vector store of match
Output	`:feed`	Yes	Recent dialog exchanges
Output	`:story`	Yes	Statement about feed
Output	`:expertiseMatchString`	Yes	Match string for Expertise repo
Output	`:expertiseMatches`	Yes	Matching Expertise embeddings and IDs
Output	`:expertiseHints`	Yes	Statement about matching expertise
Output	`:historyMatchString`	Yes	Match string for History repo
Output	`:historyMatches`	Yes	Matching history embeddings and IDs
Output	`:historyHints`	Yes	Statement about matching history
Output	`:prompt`	Yes	Prompt for LLM
Input Variables:
Input	`:text`	No	Input text string
Input	`:expertiseRepoSpec`	No	Expertise vector repo spec
Input	`:expertiseTopN`	Yes	Maximum number of Expertise matches
Input	`:expertiseMinScore`	Yes	Minimum score for Expertise matches
Input	`:historyTopN`	Yes	Maximum number of History matches
Input	`:historyMinScore`	Yes	Minimum score for History matches
Input	`:botId`	Yes	ID of bot
Input	`:historyRepoSpec`	Yes	History vector repo spec
Input	`:useClustering`	Yes	Causes a clustering created by agtool to be used, see above for more information.

AllegroGraph 8.4.1 Keyword Syntax for Magic Predicates

The Purpose of Keyword Syntax

How Keyword Syntax Works

A Note on Namespaces

Examples Using Keyword Syntax

LLM Predicates Accepting Keyword Syntax