Jans
Aasman, John Foderaro
Franz
Inc.
The Common Lisp community doesn’t really have a
common story when it comes to object persistence as an orthogonal feature to
the language. We don’t even have a standard way to do object serialization or
industrial-grade B+trees as a native data structure.
Franz decided to explore a new object-oriented
database design, AllegroCache, for Lisp application developers. Several people
have pointed out to us that the majority of the IT industry is moving in the
direction of Object Relational Mappings on top of relational databases, and
that our object-oriented database might be the wrong thing at the wrong time.
While those concerns are not without merit, they
overlook some critical developments in the IT industry especially regarding the
Lisp community.
First of all, we see more and more of our customers
trying to solve extremely complex problems with very large data sets. Some of
these problems naturally fit the relational data model and their needs can be
easily satisfied with our direct interfaces to relational databases like MySql
and Oracle. However, some problems cannot easily fit into a relational data
model. These are problems where the data is best described as a complex graph
or pointer space, and where the graphs are many times larger than memory. Furthermore, the data may be very
heterogeneous and the data schema may need to evolve frequently. We believe that such situations will
increasingly be more prevalent in the commercial sectors, and a better database
choice is clearly needed.
Secondly, we believe that the Lisp community will not win the
inter-language competition by addressing problems that other languages
and relational databases can handle adequately. Complex problems and complex data
models are where we can carve out a niche with a modest amount of
effort in an object-oriented database.
In this paper we will first describe AllegroCache
from the perspective of the Lisp programmers.
We then discuss AllegroCache from the perspective of a modern
database. Next we will talk about the
implementation of AllegroCache, particularly our experience with the Btrees
AllegroCache uses. We will conclude with some applications and prototypes that
have already benefited from this new object-oriented database.
The best way for a Lisp programmer to get a feel
for what AllegroCache can do is to try out the demo program in the
appendix. (You can get an evaluation
copy of AllegroCache by emailing us at [email protected].) The following describes the key
functionality of AllegroCache.
1.
A new persistent-class metaclass
We added a new CLOS
metaclass called “persistent-class”.
Instances of a CLOS class of persistent-class are automatically stored
on disk whenever a “commit” is called.
New keywords introduced for object slots include :index.
(defclass person ()
((name :index :any)
(soc-security :index
:any-unique)
(sex)
(dob)
(children)
(city))
(:metaclass persistent-class))
2.
Commit/Rollback
Once a program opens one
(or more) database with
(open-file-database “database name” :if-does-not-exist
:create :if-exists :supersede)
The Lisp session will be
in a transaction until the database is closed.
It is up to the program to do a (commit) or a (rollback). A commit will make all changes to the data
permanent on disk. A rollback on the
other hand will clear all new objects and changes to objects since the last commit.
3.
Unique object identifiers
Every instance created
for a persistent-class will have an object identifier (oid) that is guaranteed
to be unique and will never change during the lifetime of the database.
Programs will have access to this identifier so that any object can be
retrieved or deleted based on its oid.
4.
Referential Integrity
In many cases, the
application data model resembles a complex graph or pointer network, in which
instances point to other instances.
When an object in this space is deleted, AllegroCache will silently and
automatically turn all references to the deleted-object to nil.
5.
Maps and Indexes
AllegroCache introduces
the general concept of a persistent Map. A Map is a set of key-value pairs where
the keys and values take almost any Lisp type.
Maps are always cached in memory and stored persistently on the
disk. A Map can be 1-1, many-1, 1-many,
or many-many. A Map where the value is
the oid of a persistent object in AllegroCache is called an Index. When a class
is defined, the program can specify which slots should have Indexes maintained
for them.
6.
Sets
In some cases the value
of a slot can be a list of lisp objects. When the number of objects is low,
either a list or a vector can be used to store the oid’s. When the number of
objects becomes large, then it will be more efficient to store them in a set.
7.
Redefining Classes
One of the unique
features of AllegroCache is that you can redefine classes within your program
on the fly. When the class definitions
are redefined at run-time, AllegroCache will lazily update existing instances to
the new class definition whenever you access them.
8.
Retrieving objects
There are basically two
ways to retrieve objects from the database. If there is an index for the
object, you can then use
(retrieve-from-index
‘person ‘name “Jans Aasman”).
You can also loop through
all object instances to retrieve the one desired.
(doclass (e ‘person)
(when
(equal (name e) “Jans Aasman”)
(whatever e)))
9.
Prolog as retrieval language
The methods described in (8) above are adequate for
retrieving objects from disk or from the cache directly. However, when
working with deeply nested structures, and when it becomes difficult
to remember which slots were indexed, a more declarative way would be
easier to express search or graph-matching algorithms. AllegroCache provides Prolog as an
alternative retrieval language, which appears to be the most stable
and easiest way to formulate complex queries based on our
experience.
The previous section describes AllegroCache purely
from a Lisp programmer’s perspective to enable persistent CLOS objects. In reality, AllegroCache is also a
full-fledged, production database. As
such, it entails all essential properties of a commercial database, similar to
the typical Relational Database Management System (RDBMS)
1.
Multi-user environments
AllegroCache 1.0 can run
as a standalone database or as a server that serves multiple clients or
processes.
In the standalone version
the Btrees, database-manager and the application all reside in the same Lisp
process. In fact, the application can
have more than one database open in the same Lisp process simultaneously. Normally when working with one database, all
AllegroCache database manipulation functions will apply to that database. In the case of opening multiple databases in
the same Lisp process, special wrapping of the database manipulation functions
is required.
2.
ACID
AllegroCache complies
with the essential ACID requirements for a reliable database. Specifically, a production-grade database
management system must achieve four goals: Atomicity, Consistency, Isolation
and Durability (ACID). Databases that fail to meet any of these four goals can
not be considered reliable.
·
Atomicity states that database modifications must
follow an all or nothing rule. A transaction is said to be atomic, if one part
of the transaction fails, the entire transaction fails.
·
Consistency states that only valid data will be
written to the database. If a
transaction violates the databases consistency rules, the entire transaction
will be rolled back and the database will be restored to a state consistent
with those rules.
·
Isolation requires that multiple transactions
occurring at the same time not affect each other’s execution. For example, if one application issues a
transaction against a database at the same time that another application issues
a different transaction against the same database, both transactions should
operate on the database in an isolated manner.
Note that the isolation property does not dictate which transaction will
execute first, merely that they will not interfere with each other.
·
Durability requires that any transaction committed
to the database will not be lost. Durability is ensured through the use of
database backups and transaction logs to restore all committed transactions
prior to any software or hardware failures.
3.
Transactional Model
AllegroCache supports a
full transactional model. It can do
long and short transactions, commits and rollbacks, and the database supports
recovery to a consistent state should the machine fail during the commit.
AllegroCache can be
viewed as a shared data store and a set of one or more clients each with a
private data store. The shared data
store is only changed when a successful commit is done by one of the clients.
The shared data store is thus advanced from one state to another continuously
by the commit function. The state of the shared data store is denoted by a
number called the transaction number which increments on every successful
commit. When a client first starts up and connects to the shared data store, it
is given the current transaction number.
All references that client makes to the shared data store will be with
respect to that transaction number. Until the client does a commit or rollback
the client sees the shared data store in the same state it was in when it first
connected to the shared data store, even if other clients commit new data to
it. Therefore, the shared data store
holds not only the current state of the database but also previous states as
well. This architecture allows a very
robust transaction model suitable for any mission-critical application.
4.
Concurrency
In general AllegroCache
works best with the optimistic model for concurrency. This means that the program checks for conflict only when
committing the data. AllegroCache users
can implement more pessimistic advisory locks if so desired.
At the heart of every large database is a set of
Btrees (actually they are B+trees but we will call it simply Btrees in this
paper). The 1st prototype of AllegroCache
used the Btrees from the Berkeley DB library (www.sleepycat.com). Berkeley DB is a highly efficient C-based
Btree system that is available on almost every platform. Initially we used the
full Berkeley DB Transactional Data Store that provides for full ACID
transactions and recovery. However,
with multiple clients, performance degrades rapidly. Therefore, the AllegroCache prototype did all database management
in Lisp, and used only the plain Berkeley DB Data Store. But even with that
interface, AllegroCache encountered some technical issues. In order to ensure isolation, the Berkeley
DB code employs aggressive locking and uses low level OS locking primitives. The result is that the Lisp application just
froze when the process was waiting for access to data. This does not fit well
with Lisp's own multithreading.
So Franz decided to have a completely Lisp-based
Btree system, loosely taking after Knuth’s original algorithm. The Lisp-based
Btree provides all the functionality AllegroCache needs, with no loss of speed
and scalability (in fact, it runs faster than the C-based Btrees). The Lisp Btree handles literally billions of
key/value pairs with ease, and is incorporated into the current 2nd
Alpha release.
Besides resolving technical difficulties, a full
Lisp implementation of the Btrees offers the following advantages:
1. More
flexible caching of Btree pages. Lisp programmers now
have full access to the main statistics of the Btree performance (cache hits,
cache misses, access to overflow area’s, density of nodes and leaves, etc) so
the size of the buffer for the Btree pages can be managed on the fly for best
performance. One could even imagine a
rule-based caching system that dynamically adapts to demand.
2. Less
copying and consing. With an external Btree such as Berkeley
DB, AC has to copy data from C space into Lisp space. With Lisp Btrees, AC can read the keys and values directly from
the block buffers of the Btree code.
Therefore, it can do many more manipulations with keys and values
without having to copy any data.
3. Easier
comparison functions. Btrees keep keys in the leaves in sorted
order. To do so, it requires comparison functions. If you need a separate binary encoding scheme for keys in
Berkeley DB, you had to write your own comparison functions in C. With the Lisp Btrees, you can easily specify
both the encoding schemes for the keys and the comparison functions.
4. Easier
and better locking. Berkeley DB used C style locking, which
doesn’t fit well with Lisp multithreading. With Lisp Btree, AC now has full
control over its locking behavior.
5. Better
cursor behavior. Looping over leaves in a Btree is done
with cursors. In Berkeley DB you have to close the cursors explicitly or they
would stay in memory forever. Worst, if you forget to close a cursor and you
close the database, it would be impossible to open the database again
sometimes. With the Lisp Btrees, the
cursors are automatically garbage collected when they are no longer
needed. AC also implements floating
cursors. So even if other processes add new keys/values to, say the left in the
leave chain, the cursor would stay at the same logical spot in the leaf.
6. Easier
to distribute. Franz can now distribute AllegroCache on all the
20 supported platforms easily. With Berkeley DB, we had to create library
distributions for every supported platform.
This is also advantageous for customers who build applications with
AllegroCache. They too don’t have to
worry about the deployment platform their client requires.
7. Another
feather on the Lisp cap.
The final advantage is that we now have another demonstration of a real
world product that is just as fast as its C counterpart.
So far, we have worked with a number of partners on
applications and prototypes of AC. For
example, Pepite from Belgium runs its Common Lisp based data-mining package on
AllegroCache, Tellme is experimenting with a personal directory service for
mobile phones on AllegroCache. An Indian Lisp company, KIDO Software, is using
AllegroCache for a fraud detection system over call detail records for a
commercial telecomm company. The BioLingua project at Stanford is experimenting
with AllegroCache to store its frame-based biology knowledge.
The following example
code shows the main properties of AllegroCache. (You can get an
evaluation copy of AllegroCache by emailing us at [email protected].) The
following program will be included in the evaluation version.
; cd to the directory
where you store this tutorial and evaluate every
; expression step by
step (don't load the program in at once)
; :cd
c:/franznetwork/lisp/allegrocache/v8/tutorials/
(load
"exampleload.cl") ; load allegrocache and prolog and some utilities
(in-package :example)
(open-file-database
"/s/ja/temp/test3" ; please adapt to your filesystem
:if-does-not-exist :create
:if-exists :supersede)
; defclass* is a
macro that writes the full defclass
expression..
(defclass* person ()
((id :index :any-unique)
(first-name :index :any)
(last-name :index :any)
(city :index :any)
birth-date
children
sex
relationship))
#| this is what the
macro expands into
(defclass person ()
((id :index :any-unique :accessor id
:initform nil :initarg :id)
(first-name :index :any :accessor
first-name :initform nil :initarg :first-name)
(last-name :index :any :accessor last-name
:initform nil :initarg :last-name)
(city :index :any :accessor city :initform
nil :initarg :city)
(birth-date :accessor birth-date :initform
nil :initarg :birth-date)
(children :accessor children :initform nil
:initarg :children)
(sex :accessor sex :initform nil :initarg
:sex)
(relationship :accessor relationship
:initform nil :initarg :relationship))
(:metaclass
persistent-class)) ; note the
metaclass specification
|#
(defclass*
relationship ()
(p1 p2 start-date end-date
relationship-type))
(defclass* city ()
((name :index :any)
inhabitants state))
; a macro that
defines a nicer way to print the objects..
(defprinter person
first-name last-name)
(defprinter city
name)
(defprinter
relationship p1 p2)
;=====
; here is a list that
defines a family. So person 1 is Kathy
; Smith, she lives in
Oakland, born in 1932-4-16, female and
; had a child, person
#5 who is John Smith, etc...
(setf family '((1
Kathy Reagan Oakland 1932-4-16 female (5))
(2 Bob Smith Oakland 1934-7-10
male (5))
(3 Jane Becker Boston 1928-6-13
female (6 9 10))
(4 Mike Sears Boston 1926-5-12 male (6 9 10))
(5 John Smith Berkeley
1958-7-29 male (7 8))
(6 Debra Sears Berkeley
1956-12-18 female (7 8))
(7 Christopher Smith Berkeley
1992-9-1 male nil)
(8 Steve Smith Berkeley
1994-10-11 male nil)
(9 Ernestine Sears Chicago 1958-7-26 female nil)
(10 Johannes Sears Seattle
1962-1-1 male (11))
(11 Suzie Sears Seattle
2004-1-1 female nil)))
; so persons 1 and 2
from family are married.
(setf relationship
'((1 1 2 married 1958-4-1)
(2 3 4 married 1955-5-1)
(3 5 6 married 1994-4-4)))
; and some
information about the cities.
(setf cities '((1
Oakland 700000 California)
(2 Boston 500000 Massachusetts)
(3 Berkeley 200000 California)
(4 Chicago 2000000 Illinois)
(5 Seattle 600000 Washington)))
; now fill in the
instances..
(dolist (e family)
(make-instance 'person :id (nth 0 e)
:first-name (nth 1 e) :last-name (nth 2 e)
:city (nth 3 e)
:birth-date (nth 4 e) :sex (nth 5 e)
:children (nth 6 e)))
; the child slot
contains numbers, this form will replace the numbers
; with the actual
objects
(doclass (e 'person)
(setf (children e)
(mapcar #'(lambda (x) (retrieve 'person
'id x))
(children e))))
(dolist (r
relationship)
(make-instance 'relationship
:p1 (retrieve 'person 'id (nth 1 r))
:p2 (retrieve 'person 'id (nth 2 r))
:relationship-type (nth 3 r) :start-date
(nth 4 r)))
(dolist (c cities)
(make-instance 'city :name (nth 1 c)
:inhabitants (nth 2 c)
:state (nth 3 c)))
; do a commit, everything would work the same if you didn't do a
; commit..
(commit); the
following reads as follows:
; this demonstrates
how you can change a class definition
(defclass* city ()
((name :index :any)
inhabitants state
mayor)) ; <-- add the mayor slot..
; create a new
instance...
(make-instance 'city
:name ‘SanFrancisco :inhabitants 400000
:mayor 'GavinNewsom)
(commit)
(mayor (retrieve
'city 'name ‘SanFrancisco)) ; will return GavinNewsom
(mayor (retrieve
'city 'name ‘Boston)) ; will obviously
return nil
; low level retrieve
(doclass (e 'person)
(print e))
(retrieve 'person
'first-name 'Bob) ; works if you have an index
; if you provide the
:oid keyword you only get the oid back.
(setf oid (retrieve
'person 'first-name 'Bob :oid t))
(oid-to-object
(find-class 'person) oid)
; prolog as one
retrieval language
#|
you need to know a little bit of prolog to
understand something like
(<-- (father ?x ?y)
(db person ?x sex male children ?c)
(member ?y ?c))
?x is the father of
?y if
in the db there is an
object ?x of class person that is male and has a
list of children ?c and ?y is a member of ?c.
the result of running
(?- (father ?x ?y)) would be
example(26): (?-
(father ?x ?y))
?x = #<person Bob
Smith>
?y = #<person John
Smith>
?x = #<person Mike
Sears>
?y = #<person
Debra Sears>
…
…
|#
(<-- (father ?x
?y)
(db person ?x sex male children ?c)
(member ?y ?c))
(<-- (mother ?x
?y)
(db person ?x sex female children ?c)
(member ?y ?c))
(<-- (parent ?x
?y)
(father ?x ?y))
(<- (parent ?x ?y)
(mother ?x ?y))
(<-- (grandparent
?x ?y)
(parent ?x ?z)
(parent ?z ?y))
(<-- (grandchild
?x ?y)
(grandparent ?y ?x))
(<-- (ancestor ?x
?y)
(parent ?x ?y))
(<- (ancestor ?x ?y)
(parent ?x ?z)
(ancestor ?z ?y))
(<-- (descendent
?x ?y)
(ancestor ?y ?x))
(<--
(parent-child-in-different-state ?x ?y)
(db person ?x city ?n1)
(parent ?x ?y)
(db city ?c1 name ?n1 state ?state1)
(db person ?y city ?n2)
(db city ?c2 name ?n2 state ?state2)
(not (= ?state1 ?state2)))
(<-- (female ?x)
(db person ?x sex female))
(<-- (male ?x)
(db person ?x sex male))
(<-- (aunt ?x ?y)
(father ?z ?x)
(female ?x)
(father ?z ?w)
(not (= ?x ?w))
(parent ?w ?y))
(<-- (uncle ?x ?y)
(father ?z ?x)
(male ?x)
(father ?z ?w)
(not (= ?x ?w))
(parent ?w ?y))
(<-- (nephew ?x
?y)
(aunt ?y ?x)
(male ?x))
(<- (nephew ?x ?y)
(uncle ?y ?x)
(male ?x))
(<-- (niece ?x ?y)
(aunt ?y ?x)
(female ?x))
(<- (niece ?x ?y)
(uncle ?y ?x)
(female ?x))
; using names instead
of instances..
(<-- (father2 ?x
?y)
(db person ?x1 first-name ?x sex male
children ?c)
(member ?y1 ?c)
(db person ?y1 first-name ?y))
;; some fun
metaprogramming...
(load
"metarelations.cl")
(clear-relations)
(register-relations
'father 'mother 'parent 'grandparent
'grandchild
'ancestor 'descendent
'parent-child-in-different-state
'aunt 'uncle 'nephew 'niece)
#|
register-relations
turns all binary relations into ternary relations,
so now you can ask
for all the relations that exist between ?x and ?y
(<-- (relation ?x
?y father) (father ?x ?y))
(<- (relation ?x
?y mother) (mother ?x ?y))
(<- (relation ?x
?y parent) (parent ?x ?y))
(<- (relation ?x
?y grandparent) (grandparent ?x ?y))
(<- (relation ?x
?y grandchild) (grandchild ?x ?y))
(<- (relation ?x
?y ancestor) (ancestor ?x ?y))
(<- (relation ?x
?y descendent) (descendent ?x ?y))
(<- (relation ?x
?y parent-child-in-different-state) (parent-child-in-different-state ?x ?y))
(<- (relation ?x
?y aunt) (aunt ?x ?y))
(<- (relation ?x
?y uncle) (uncle ?x ?y))
(<- (relation ?x
?y nephew) (nephew ?x ?y))
(<- (relation ?x
?y niece) (niece ?x ?y))
|#
(<-- (person ?x)
(db person ?x))
; prologgers will
know how to read this
; note how we start
prolog within lisp and then within prolog call
; lisp again.
(defun
show-all-relations ()
(prolog (person ?x)
(person ?y)
(not (= ?x ?y))
(bagof ?r (relation ?x ?y ?r) ?bag)
(lisp (print (list ?x ?y ?bag)))))
(show-all-relations)
;; maps
(setf m1 (open-map
"square" :if-does-not-exist :create
:if-exists :open))
(setf (map-value m1
'eurolisp2005) 'Amsterdam)
(map-value m1
'eurolisp2005)
(setf foo (list
1.001
"a relatively short
string"
'(1 2 3)
'horse
#(1 2 3)
(retrieve 'person 'first-name
'John)
m1
))
; everything that can
be stored in allegrocache can be used as a key for a map. So
; in the following
example we take every item in foo both as a key and as a value..
(dolist (e foo)
(setf (map-value m1 e) e))
(dolist (e foo)
(print (map-value m1 e)))
(map-map #'(lambda (k
v)
(print (list k v)))
m1)
(commit)
;; sets
(setf set
(make-instance 'ac-set))
(push set foo)
(dolist (e foo)
(add-to-set set e))
(commit)
(doset (var set)
(print var))
(remove-from-set set
'(1 2 3))
(doset (var set)
(print var))
;a typical usages of
sets as slots of objects..
(setf p1
(make-instance 'person
:first-name 'fritz :last-name 'kunze
:children (make-instance 'ac-set)))
(add-to-set (children
p1)
(make-instance 'person :first-name
'michael
:last-name 'kunze))
(add-to-set (children
p1)
(make-instance 'person :first-name
'lauren
:last-name 'kunze))
(doset (ch (children
p1))
(print (first-name ch)))
(commit)
;(close-database
*allegrocache*)
;;; frames
; the autor
implemented a very simple example of how to implement a
; frame system on top
of allegrocache. It is forbidden to use it for any
; serious work :-) it
is just an example of how to do frames on top of objects.
(load
"../frames/funframes.fasl")
(start-frame-system)
(defframe 'animal
:isa 'living-thing)
(defframe 'mammal
:isa 'animal :legs 4 :eyes 2)
(defframe 'dog :isa
'mammal)
(defframe 'fido :isa
'dog)
(fsv! 'fido :owner
'jans) ; add a new slot to fido
(fsv 'fido :legs) ;
use the isa hierarchy to derive that fido has 4 legs.
;;; xml
; The function
objectify-xml is a function that takes an arbitrary XML
; document, analyses
it and turns it into CLOS objects that are stored
; in the database.
(objectify-xml :file
#P"../xml-demo/sersamp2005.xml"
:persistent t)
; this shows all the
classes that are built from the xml file
(dump-schema)
; this shows that
class-definitions are first class objects in the
database.
(doclass (e
'ac-class)
(print (ac-class-name e)))
; and this uses some
of the classes read in.
(doclass (e
's::DateCreated)
(print (s::Year e)))