http://franz.com/ns/allegrograph/8.0.0/llm/askMyDocuments
Collect background information from a vector database to build a prompt and return the response to that prompt, along with matching URI citationIds and matching scores.
Namespace:
PREFIX llm: <http://franz.com/ns/allegrograph/8.0.0/llm/>
General forms:
(?response ?score ?citationId) llm:askMyDocuments (?text ?vectorRepoSpec ?topN ?minScore)
(?response ?score ?citationId) llm:askMyDocuments (?text ?vectorRepoSpec ?topN)
(?response ?score ?citationId) llm:askMyDocuments (?text ?vectorRepoSpec)
(?response ?score ?citationId ?originalText) llm:askMyDocuments (?text ?vectorRepoSpec ?topN ?minScore)
(?response ?score ?citationId ?originalText) llm:askMyDocuments (?text ?vectorRepoSpec ?topN)
(?response ?score ?citationId ?originalText) llm:askMyDocuments (?text ?vectorRepoSpec)
This predicate implements Retrieval Augmented Generation (RAG) by collecting background information through embedding based matching. Beginning with a search of ?vectorRepoSpec
for the ?topN
best matches to ?text
, above a minimum matching score of ?minScore
. It then combines this question, the matching citationIds and background info into a big prompt for the LLM. This helps ensure that the LLM has a source of truth to answer the question, and reduces the chance of hallucination.
The big prompt combines various bits like the following sketch:
Here is a list of citation IDs and content related to the query <query>
with these <citations>. Respond to the query as though you wrote the
content. Be brief. You only have 20 seconds to reply.
Place your response to the query in the response field.
Insert the list of citations whose content informed the
response into the citation_ids array.
Processing the big prompt text also causes the LLM to return only those citationIds whose content contributed to the final response.
The optional object parameter ?topN
, if not specified, has a default value of 5,
The optional object parameter ?minScore
, if not specified, has a default value of 0.8,
The predicate returns a response as well as the matching score, citationId URI, and source content from the vector database.
Note that llm:askMyDocuments may utilize keyword syntax.
You can use agtool to build a vector database from text literals stored in an Allegrograph repository (see the documentation on using agtool for LLM embedding).
There is a fully worked out example in the LLM Examples using data about Noam Chomsky.
Notes
The following namespace abbreviations are used:
- fti - <http://franz.com/ns/allegrograph/2.2/textindex/>
- geo - <http://franz.com/ns/allegrograph/3.0/geospatial/>
- geofn - <http://franz.com/ns/allegrograph/3.0/geospatial/fn/>
- nd - <http://franz.com/ns/allegrograph/5.0/geo/nd#>
- ndfn - <http://franz.com/ns/allegrograph/5.0/geo/nd/fn#>
- sna - <http://franz.com/ns/allegrograph/4.11/sna/>
The SPARQL magic properties reference has additional information on using AllegroGraph magic properties and functions.