Integrating with Hadoop

For AllegroGraph users interested in integrating Hadoop as part of their application, we suggest the following pipeline for working with Hadoop.

  1. Data is stored in Hadoop or Hbase.

  2. Data is retrieved via Hive or Pig or straight Map Reduce and returned to AllegroGraph as csv tables, or json files.

  3. We apply a mapping to these tables or json files to produce nodes and edges or triples that go into AllegroGraph.

At the integration level we can call SQL or PIG directly from AllegroGraph from Prolog. Here is a typical simplified Prolog example:

(hive "select id, dob, dod from person" csv "person.csv")  
(triplemap "c:/tmp/person.map" "person.csv" "person.nt")  
(load-ntriples "person.nt") 

An interesting customer example that leveraged the capabilities of AllegroGraph and Hadoop Los Alamos National Labs. For more information about their process please review the white paper avalable at http://franz.com/agraph/cresources/white_papers/.

There may be additional options working with Hadoop via Intel's GraphBuilder. For additional information please see this link:

http://www.intel.com/content/dam/www/public/us/en/documents/articles/intel-dsd-graph-builder-collateral-faq.pdf