Introduction
Virtual Graphs are a trending paradigm for low-impact data access when building enterprise-wide knowledge graphs. It is also known in the literature as Ontology-based Data Access. Typically, such virtual graph systems expose the content of arbitrary relational databases or data warehouses in order to enrich existing knowledge graph systems. The exposed graphs can be virtual, which means all data remains in the data sources or materialized, where the data is converted into RDF triples (or quads) and stored directly in AllegroGraph.
AllegroGraph provides an integration with Ontop, a leading provider of virtual graph tools, so users can easily virtualize data as part of their AllegroGraph Knowledge Graph solution. The integration of AllegroGraph with Ontop allows a user to integrate with essentially any data source with a supported JDBC driver. Here are some examples:
DATABASES: Apache Cassandra, Apache Hive, AWS Athena, AWS Aurora, AWS Redshift, CosmosDB, DataStax, Derby, Elasticsearch, Exasol, Google BigQuery, H2, IBM DB2, Apache Impala, MariaDB, Microsoft SQL Server, MongoDB, MySQL, Odata, Oracle Database, PostgreSQL, REST, SAP Business One DI, SAP HANA, Sybase ASE, Teradata.
BI TOOLS: Apache Superset, cumul.io, IBM Cognos, Metabase, Microsoft PowerBI, RapidMiner, Siren, Tableau.
CRM: Dynamics 365 Sales, Dynamics CRM, Netsuite, Odoo, Salesforce Einstein, Salesforce, SAP ByDesign, SAP Netweaver Gateway, SugarCRM, Veeva CRM.
DATA ANALYTICS: Adobe Analytics.
CLOUD SERVICES: Active Directory, AWS Management, Azure Management, Facebook, Hubspot, Instagram, Jira, LDAP, LinkedIn, Marketo, Microsoft Teams, Oracle Eloqua, Oracle SalesCloud, Salesforce Chatter, Salesforce Marketing, Salesforce Pardot, SAP SuccessFactors, SAP, ServiceNow, Slack, Splunk, Twilio, Veeva, Zendesk.
FILES/UNSTRUCTURED DATA: Box, CSV, Dropbox, Email, Excel Online, Excel Services, Excel, Gmail, Google Calendar, Google Contacts, Google Drive, Google Sheets, JSON, Microsoft CDS, Microsoft Exchange, Microsoft OneDrive, Microsoft OneNote, Microsoft Planner, Microsoft Project, Office365, Parquet, PDF, Sharepoint.
For a tutorial on how to build virtual graphs with AllegroGraph and Ontop, please visit our GitHub Example page.
Using agtool to materialize (ETL) data
The AllegroGraph agtool vload command can use materialize feature (see Ontop materialize) to convert a relational database to triples and to then load those triples. This does require specifying how you want the triples to be materialized.
Note that AllegroGraph provides an interface to Ontop. Some questions or issues may be best directed to Ontop's support team. Please also contact us at [email protected] and we will help as best we can.
The agtool vload command creates and loads triples
[agtool[agtool] is AllegroGraph's general command utility. The first argument to agtool specifies what it does. The command to load triples from relational databases is
agtool vload [REQUIRED-ARGUMENTS] [MAPPING-ARGUMENT] [OPTIONS] REPO
The REQUIRED-ARGUMENTS are:
- --ontop-home PATH
- the path of the Ontop home directory.
- --properties FILE
- the properties file. This file provides information needed to run Ontop, like the database name, username, password and the like. See https://ontop-vkg.org/properties/basic.properties for a sample properties file.
The MAPPING-ARGUMENT can be one of the following two arguments. See Ontop materialize documentation for more information. Specify either
- --mapping FILE, -m FILE
- the mapping file, which must be in R2RML (.ttl) format or in Ontop native format (.obda).
or
- --base-iri URI
- the URI that will be used as the base for creating triples.
The OPTIONS arguments:
- --disable-reasoning
- Tell Ontop to disable OWL reasoning. When unspecified, false.
- --error-strategy STRATEGY, -e STRATEGY
- The error handling strategy. STRATEGY can be
ignore
-- ignore and continue;save
-- write errors to agraph log file and continue;cancel
(the default) -- cancel loading on firat error. - --workspace PATH
- the path of a directory that will be used for temporary files. The default is /tmp. For a large database, the temporary files can be large and the conversion and load will fail if temporary space is exhausted, so be sure the space in PATH (or in /tmp is this argument is not given) is sufficient.
The following load options are also accepted. See Data Import for discriptions of these arguments: --quiet
, --verbose
, --duplicates
, --fti
, --optimize
, --overwrite
, --supersede
, --parameter
, --with-indices
. Some of these arguments have one letter abbreviations.
The REPO argument: the repository into which the triples will be loaded. See the Repository Specification document for how to specify a repository (but http[s]://username:password@host:port/[catalogs/cat-name/]repositories/repo-name
will always work. Leave out catalogs/cat-name/
when using the root catalog. Triples will be added unless --overwrite
or --supersede
is specified in which case the repository will be deleted (all existing triples deleted) and created afresh.
Examples
The following are examples of calls:
agtool vload --ontop-home /path/to/ontop --properties example.properties --mapping example.obda 10035/example-repo
agtool vload --overwrite --ontop-home /path/to/ontop --properties example.properties example-repo
agtool vload --with-indices posgi,gospi,i --ontop-home /path/to/ontop --properties example.properties 10015/example-repo