Example 16: Federated repositories¶
AllegroGraph lets you split up your triples among repositories on multiple servers and then search them all in parallel. To do this we query a single “federated” repository that automatically distributes the queries to the secondary repositories and combines the results. From the point of view of your Python code, it looks like you are working with a single repository.
To illustrate this, let us first create two repositories and import some data. The data will represent positive numbers below 15. The first repository will contain all Fibonacci numbers in that range, while the second one will contain all other numbers.
from franz.openrdf.connect import ag_connect
with ag_connect('python_fib', catalog=AGRAPH_CATALOG,
host=AGRAPH_HOST, port=AGRAPH_PORT,
user=AGRAPH_USER, password=AGRAPH_PASSWORD,
create=True, clear=True) as conn:
conn.addData("""
@prefix : <ex://> .
:one :value 1 .
:two :value 2 .
:three :value 3 .
:five :value 5 .
:eight :value 8 .
:thirteen :value 13 .
""")
with ag_connect('python_boring', catalog=AGRAPH_CATALOG,
host=AGRAPH_HOST, port=AGRAPH_PORT,
user=AGRAPH_USER, password=AGRAPH_PASSWORD,
create=True, clear=True) as conn:
conn.addData("""
@prefix : <ex://> .
:four :value 4 .
:six :value 6 .
:seven :value 7 .
:nine :value 9 .
:ten :value 10 .
:eleven :value 11 .
:twelve :value 12 .
:fourteen :value 14 .
:fifteen :value 15 .
""")
To create a federated repository, we first have to connect to the
server that will be used to aggregate results. We do this by creating
an AllegroGraphServer
instance.
from franz.openrdf.sail.allegrographserver import AllegroGraphServer
server = AllegroGraphServer(
AGRAPH_HOST, AGRAPH_PORT, AGRAPH_USER, AGRAPH_PASSWORD)
We are using server address and credentials configured in the Setting the environment for the tutorial section of the tutorial.
The next step is to use the openFederated()
method to create a federated session. We will pass the list of
repositories to federate as an argument. Elements of this list could
be
Repository
objectsRepositoryConnection
objects- strings (naming a store in the root catalog, or the URL of a store)
- (storename, catalogname) tuples.
We’ll use the last option
conn = server.openFederated([('python_fib', AGRAPH_CATALOG),
('python_boring', AGRAPH_CATALOG)])
Now we can query the combined repository.
query = conn.prepareTupleQuery(query="""
select (avg(?v) as ?avg)
(min(?v) as ?min)
(max(?v) as ?max) where {
?number <ex://value> ?v .
}""")
query.evaluate(output=True)
As we can see, data from both repositories has been returned and aggregates have been correctly computed over the whole dataset.
-------------------
| avg | min | max |
===================
| 8.0 | 1 | 15 |
-------------------
Another example of using federated repositories, this time with multiple server machines, can be found in Running AG on AWS EC2.