Note this is pre-release feature
The feature described in this document, natural language SPARQL queries is still in development. This is a pre-release feature. It has been tested extensively in house but natural language questions may have many different phrasings and testing by just a few people may bias a tool toward their way of speaking and writing. We are hoping for extensive feedback from users to tell us what works well and what needs improvement.
Introduction
The example in this document uses, as noted below, the kennedy repository supplied with the AllegroGraph distribution and discussed in the AllegroGraph Quick Start document. There is a second example of this feature in the AllegroGraph.cloud document using the olympics repository which is preloaded in the AllegroGraph.cloud server. (The kennedy repo is not preloaded.)
The Natural Language Query (NLQ) to SPARQL feature in AllegroGraph leverages the power of vector databases (VDBs) to map human-readable natural language queries into precise SPARQL queries. This feature makes it easier for users to interact with data in a semantic graph without needing extensive knowledge of SPARQL, opening up possibilities for non-technical users to query the system using everyday language.
AllegroGraph does already support natural language querying, but in connection chatStreams/chatBots (see Natural Language query to Knowledge Graph).
This document introduces a new feature: typing a natural language query directly into the SPARQL query page in WebView.
To do this, however, does require some setup. You will need to create a NLQ (Natural Language Query) VDB (Vector DataBase). The NLQ VDB is a specialized vector database that stores pairs of natural language queries and their corresponding SPARQL queries. This VDB is directly associated with the AllegroGraph triple store and acts as a repository of mappings between how users might ask a question in natural language and how that query should be expressed in SPARQL. As more mappings are stored, the NLQ VDB becomes increasingly adept at translating user queries into accurate SPARQL queries.
AllegroGraph provides tools which assist in creating the NLQ VDB. See the worked out example below which shows how you can get quite far just using the default tools.
Using SHACL in NLQs
SHACL (the Shapes Constraint Language, see the SHACL document) is a crucial part used in defining the structure and constraints of the data within a graph database. AllegroGraph uses SHACL shapes as a guide for generating correct SPARQL queries based on natural language input. (A standard use of SHACL is for database validation -- verifying, for example, that every instance of employee
has at least one phone-number
and exactly one salary
. But SHACL has many more uses, as this document shows.)
Why SHACL Matters: SHACL shapes ensure that the generated SPARQL queries conform to the structure of the data in the triple store. If SHACL constraints are not accurately defined, the system may generate incorrect SPARQL queries, leading to suboptimal results or query failures.
Best Practices: Users should ensure that the SHACL shapes accurately reflect the structure and rules of the data in the triple store. Therefore once things are set up, they should periodically review and update SHACL shapes to maintain high accuracy in the query generation process.
Improving the Quality of Results: Users can improve the quality of the natural language to SPARQL translation by:
Expanding the NLQ VDB: Continuously store more pairs of natural language queries and their corresponding SPARQL queries that are specific to the dataset and schema of the triple store. This helps the system learn more mappings and improves the precision of query generation.
Refining SHACL Shapes: Regularly update the SHACL shapes to reflect changes in the data structure, ensuring that the system generates SPARQL queries that are both accurate and efficient.
Implementing Feedback Mechanisms: Incorporating user feedback on the accuracy of the generated SPARQL queries can help refine the system’s learning process.
By following these guidelines, users can ensure that AllegroGraph’s natural language to SPARQL feature generates accurate and contextually appropriate queries.
A simple example
The AllegroGraph Quick Start introduces the kennedy database (of members of the family of President John F. Kennedy) as a simple starting example. We will use that example here to show how quickly you can start using Natural Languange queries.
The ntriples file kennedy.ntriples is available in the tutorial directory located in this subdirectory of the Franzinc allegrograph-examples github site. Create a kennedy repo and load the kennedy.ntriples into it (follow the instructions in AllegroGraph Quick Start is necessary).
Here is what WebView looks like with the kennedy repo loaded and open.
Display the New Query menu and choose Natural Language (NL) to SPARQL.
Since there is no associated NLQ Vector DataBase (VDB) associated with the kennedy repo one will have to be created. This can be done automatically. We have selected openai
as the Embedder and chosen a model and supplied our OpenAI key, and also selected the two options about data agnostic and taxonomy queries.
Click CREATE NLQ VDB & SHACL SHAPES. The database is created (named kennedy-nl-BcdNj, shown under NLQ VDB to use in the upper right).
You enter your NL query where the prompt labeled Enter a Natural Language Query here. When you do the Run NL Query button on the right will become active. We entered "Tell me about Joseph Kennedy." and here is the result:
There are several persons in the database named "Joseph Kennedy" and all are listed along with what is known about them. Joseph Patrick Kennedy, the family patriarch, born in 1888, has 3 lines as he had three professions (producer, banker, and ambassador). Note that "Tell me about Joseph Kennedy." is not strictly speaking a question, but the system interpreted it correctly and responded as desired.
Now we ask "Which person was president?" and we do not get any answers:
The problem is President
is capitalized as the profession
while it is not in the database. We edit the generated SPARQL to use president
, click Run SPARQL, and get the expected response:
Note that what behavior you see is not deterministic so you may not see exactly what we report here if you follow the same steps.
Updating the NLQ VDB
Now we like the SPARQL query generated with the lower case so we want to save the NL query and the generated SPARQL so it can be used as a template for later queries.
To the right are buttons SAVE TO NLQ VDB, EDIT NLQ VDB, and EDIT SHACL. Click on SAVE TO NLQ VDB so the "Which person was president?" NL query is saved along with the (corrected) SPARQL query. Then click on EDIT NLQ VDB and all the saved NL queries and their associated SPARQL queries are displayed. There are a lot generated by the system (note the scroll bar in the image). We have scrolled down to the one we just added:
Now let us add one template ourselves. On the Edit kennedy-nl-BcdNj NLQ VDB page we want to add the NL query "Who was president?" and associate it with this SPARQL:
SELECT ?person
WHERE {
?person rdf:type <http://www.franz.com/simple#person> .
?person <http://www.franz.com/simple#profession> <http://www.franz.com/simple#president> .
}
Click on NEW EXAMPLE, enter the NL query and the SPARQL query, and click UPDATE (at the bottom). Now when we ask "Who was president?" that is the SPARQL we get.
Updating SHACL
There are SHACL shapes generated by the system based on characteristics of the repository. These can be viewed and edited, or new shapes added by clicking on the EDIT SHACL BUTTON.
Please do send comments, examples, and suggestions
As said at the beginning, converting from an NL query to a SPARQL query is in a pre-release state. Feedback on what works and what does not is needed to polish the product. Please do send comments and suggestions to [email protected]
.