http://franz.com/ns/allegrograph/8.0.0/llm/nearestNeighbor

Namespace:

PREFIX llm: <http://franz.com/ns/allegrograph/8.0.0/llm/> 

General forms:

(?uri ?score ?originalText) llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector ?useClustering)  
(?uri ?score) llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector)  
?uri llm:nearestNeighbor (?text ?vectorRepoSpec ?topN ?minScore ?selector) 

For example, the pattern

?uri llm:nearestNeighbor ("Famous Scientist" "historicalFigures" 10 0.8) 

will bind ?uri to each of up to 10 subject nodes in the vector database historicalFigures where the match score between the embedding vector of "Famous Scientist" and the embeddings of the original text in the database is at least 0.8. API JSON response.

The predicate binds an optional second parameter ?score with the value of the match score. It binds an optional third parameter ?originalText with the value of the original text.

The ?useClustering argument is optional and if given any value it will run a different algorithm which quickly returns an approximation of the nearest neighbor. The first time the approximation algorithm is run on a vector repo an index will be built inside the repo and this can take some time. Subsequent invocations will return an answer very quickly. Using this algorithm only makes sense when the full nearest neighbor is too slow due to having to check a very large number of objects in the vector database.

The ?selector argument is optional. If given it should be the body of a sparql query where the result should be bindings for ?id which are resources in the vector database that have rdf:type of vdb:Object. The default value for ?selector is

"{?id rdf:type vdb:Object}" 

In the Sparql expression the namespaces vdb and vdbprop are defined.

prefix vdb: <http://franz.com/vdb/gen/>  
prefix vdbprop: <http://franz.com/vdb/prop/>  
 

API Key

If you are using OpenAI, you need an API key to utilize this predicate. (Ollama does not require a key but does require other setup as described in the Ollama document just linked.) See https://platform.openai.com/overview for instructions on obtaining a key (start with the Quickstart Tutorial and follow the links there to get a key). There are three ways to configure your API key, as a query option prefix or in a couple of places in the Allegrograph configuration.

As a query option prefix, write:

PREFIX franzOption_openaiApiKey: <franz:sk-U01ABc2defGHIJKlmnOpQ3RstvVWxyZABcD4eFG5jiJKlmno> 

Syntax for config file:

QueryOption openaiApiKey=<franz:sk-U01ABc2defGHIJKlmnOpQ3RstvVWxyZABcD4eFG5jiJKlmno> 

In the file data/settings/default-query-options:

(("franzOption_openaiApiKey" "<franz:sk-U01ABc2defGHIJKlmnOpQ3RstvVWxyZABcD4eFG5jiJKlmno>")) 

API Options

The proprietary OpenAI API exposes many options and parameters for interaction with their LLM models. Currently the AllegroGraph magic predicates and functions take an opinionated approach and hide most of these options behind the scenes. Specifically, we set

API endpoint: https://api.openai.com/v1/chat/completions 

Endpoint parameters:

min-score: gpt:*openai-default-min-score*  
model: "text-embedding-ada-002"  
top-n: gpt:*openai-default-top-n*  
verbose nil 

Note that llm:nearestNeighbor may utilize keyword syntax.

Additionally we impose an API timeout of 10 seconds.

Finally, when the OpenAI API times out, returns an error, or the magic predicate implementation fails to parse the response, or any other error occurs, the magic predicate displays an informative message in Webview. Please contact AllegroGraph Support if you require different API options or customization.

Notes

The following namespace abbreviations are used:

The SPARQL magic properties reference has additional information on using AllegroGraph magic properties and functions.