Allegro Btrees

copyright (c) 2012 Franz Inc

This is the API for the Btree module. The symbols mentioned in this document are exported from the db.btree package.

Background
Opening and Closing
  open-btree
  create-btree
  create-memory-btree
  close-btree
  sync-btree
Storing and Retrieving data
  get-btree
  get-btree-ext
  set-btree-ext
Cursors
  create-cursor
  position-cursor
  position-cursor-ext
  cursor-get
  cursor-next
  cursor-previous
  cursor-get-ext
  cursor-next-ext
  cursor-previous-ext
  cursor-delete
  unbind-cursor
Miscellaneous
  copy-btree
  merge-btrees
  analyze-btree
  btree-user-data
  dump-btree
Replace Hook
Encoding
Examples
Advanced Examples
  Opening a Btree
  Accessing data
  Large Btrees and Merging
Index

Background

A btree is a data structure with the following properties

Btrees are well suited for storage on disk and thus they are the basis for most database systems.

The format of a btree on disk is portable across all ACL platforms.

The Allegro implementation of btrees has these properties

In theory it's possible or more than once process to read the same btree. In practice it will only work on Unix since on the Windows operating system forbids multiple processes open the same file for read/write (and we don't yet suppport opening a btree file read-only).

If a process is modifying a btree then no other process should read it. On Unix there's nothing to prevent multiple writers or readers of the same file so it's the programmers responsibility to ensure that this does not occur.

Opening and Closing

(open-btree filename &key (if-exists :open)
                          (if-does-not-exist :error)
                          (unique-keys nil)
                          cache-size
                          split
                          compare-function
                          buffer-pool
                          large-block-allocator
                          readahead
                          partial-writes
                          read-only)

Opens an existing btree or creates a new btree.

if-exists determines what open-btree does if the btree named already exists

if-does-not-exist determines what is done if the btree named does not exist yet:

Note that a btree can be opened by at most one process at a time. The btree code does not do the file locking necessary to allow more than one reader or writer.

If unique-keys is true then at most one value is associated with a given key. Attempting to store a second value for a given key will cause the first value to be deleted. You can only specify the unique-keys behavior when you create a btree. If you open an existing btree then the unique-keys argument is ignored.

If cache-size is given it should be an the integer number of bytes of buffers that the btree code may maintain for this btree. The actual number of bytes of buffers at any given time may be larger than this maximum. This number is used as a general guide to the size of the buffer cache. If cache-size is not given then the btree code will maintain all blocks it encounters in the buffer cache. This is undesirable if the btree is large.

When data is inserted into a btree it is placed inside the appropriate block. Eventually the block gets filled up and must be split into two blocks. If split is given it should be a floating point number between 0 and 1 and it specifies how much of a split block is left in the original block (the rest being moved to the new block). A value of 0.5 would split the data evenly. The default is 0.7. You would choose a split value closer to 1.0 if you felt that it's likely that the next keys to be inserted will be larger than the previous keys inserted.

If compare-function is given it should be a function of six values

        arr1 start1 end1  arr2 start2 end2
where arr1 and arr2 are (unsigned-byte 8) vectors and the start and end values denote the range of the array to consider.

This should compare the values and return

        1  if arr2 > arr1
        0  if arr2 = arr1
       -1  if arr2 < arr1

The default comparison function is a simple lexicographic comparison function. With that comparison function

The btree code caches disk blocks in buffers created from lisp vectors allocated in old space. When the cache of buffers get too large the btree code frees buffer to reduce the cache size. Normally freeing a buffer means eliminating pointers to it so it can be garbage collected. However since the buffers are allocated in old space they only get garbage collected when a global gc is done. Also after freeing a buffer the btree code will likely need to allocate a new buffer at some time in the future. Thus we have the buffer-pool object. A buffer pool is a free list of buffers. When the btree code wants to free a buffer from the cache and the btree has an associated buffer-pool, the buffer is added to the buffer pool rather than being left to be garbage collected. When the btree code needs a buffer it first checks the buffer pool and reuses a buffer from there instead. A buffer-pool can be shared by multiple btrees thus ensuring that freed buffers soon find a new use.

The value of buffer-pool can be a buffer pool object. A buffer pool object is created by evaling (make-btree-buffer-pool). That buffer pool will by used by this btree. You share a buffer pool among btrees by passing in the same buffer pool object when the btrees are opened.

If buffer-pool is any other non-nil value then a buffer pool object will be created and used only by this btree.

large-block-allocator is an optional function that the btree code will call to allocate large simple vectors of (unsigned-byte 8). The function takes two arguments

      (lambda (kind size) ....)
The value of kind is
:read - we're about to read a very large value from the btree
any other value - ignore this call and return nil

size is the number of bytes needed (but returning a larger vector is ok.) The function can return either a vector of at least the desired size or nil. If the function returns nil then the btree code will do the allocation. If a block is returned by the function it will not be cached in the btree's buffer cache.

readhead if given should be an integer. readahead is used to efficiently read the btree file into the operating system's file cache. The fact that btree blocks are read somewhat randomly will fail to trigger the code in the operating system to automatically read the btree file into the cache in the background, thus the need for the readahead argument. The value of readahead specifies how many bytes are read into the file cache at a time. A value such as 20,000,000 is reasonable. Using readahead a file that's being merged has been seen to speed up the merging operation by a factor of 10. Readahead is not so useful if the btree will be accessed infrequently or over a long period of time as the operating system's file cache will soon be replaced by other more active files.

partial-writes specifies whether the btree code can write out just some of the modified buffers when its forced to write out modified buffers due to the buffer cache filling up. Normally all modified buffer are written out since that ensures that the btree on disk is valid most of the time. If you specify true for partial-writes then the btree on disk will be invalid most of the time and will only become valid when you call sync-btree. Specifying true for partial-writes does make the btree code faster as it eliminates many unnecessary writes.

read-only of true will cause the btree file to be opened for reading only. Any action that will cause a btree block to be written will signal an error. Reading objects from the btree is not a problem. Adding new objects to the btree will work as well until a modified btree buffer needs to be written to the disk.

open-btree returns a btree object





(create-btree filename &key (if-exists :supersede)
                            (if-does-not-exist :create)
                            (unique-keys nil) split
                            cache-size compare-function buffer-pool
                            large-block-allocator
                            partial-writes)

Just like open-btree except that the defaults for the keyword arguments are such that a new btree will be created whether or not the btree already exists.





(create-memory-btree &key (unique-keys nil)
			   split
			   buffer-pool
			   compare-function
                           large-block-allocator
                           partial-writes)

Create a btree in memory only. This btree need not be closed as it doesn't have an associated file to close.





(close-btree btree &key abort)
calls sync-btree to write out all modified blocks and then closes down the connection to the btree file on disk. After the btree is closed no btree operations are possible other than btree-user.

If the value of abort is true then the sync-btree is not done.



(sync-btree btree)
writes out all modified blocks to the disk. During normal btree operation a sync-btree will be done automatically if the size of the btree cache in memory exceeds the user-specified limit and cache space must be reclaimed.





Storing and Retrieving data

Keys and values are always simple-vectors of type unsigned-byte 8.

There are two types of functions in the API:

(get-btree btree key)
return the value associated with the given key or nil if the given key isn't present in the database. If unique-keys is false for this database then the value returned is any one of the values associated with that key. In this case if you wish to retrieve all the values associated with the key the program will need to use cursors.





(setf (get-btree btree key) value)
Store the given value in the btree under the given key. Both key and value are simple vectors of type (unsigned-byte 8).

If unique-keys is true and if there is already a value for the given key then that existing value is deleted before the new value is stored.

returns: value





(get-btree-ext btree key kstart kend)
like get-btree except that the key is those bytes in the key vector from index kstart up to but not including kend.
get-btree-ext returns three values

the answer is in the returned vector from indicies start to end-1





(set-btree-ext btree key kstart kend value vstart vend)
like (setf (get-btree ...) ...) except that the key and value used are the subsequences of the key and value vectors within the given limits

The return value is undefined.





Cursors

A cursor is an object that traverses a given btree. Objects in a btree are sorted by key thus a cursor can give the user access to objects in the sorted order.

Cursors float with changes to the tree. If you position a cursor at an item and then insert or delete something in the btree before that item the cursor will continue to point at that item. If you delete the item the cursor is pointing to it will then point to the following item

You do not have to call a function to close a cursor when you're finished with it but you should call

    (unbind-cursor cursor)
to disassociate the cursor from a particular spot in the btree. If that isn't done the cursor object will never be garbage collected until the btree is. Also it will mean that the btree has to perform work to keep the cursor floating correctly.

A cursor can be primed. A primed cursor will treat a request for the next or previous item as a request for the current item at the cursor and will then set the cursor to not-primed mode.

The prime feature allows you to position the cursor and prime it and then use only next commands to read all the data beginning with the data found by the position command. Below is a function to count the items in a btree. If we had not specified :prime t in the call to position-cursor the code would have had to call cursor-get the first time around the loop and then cursor-next subsequently. By priming the cursor we can write the loop using only cursor-next.

(defun items-in-a-btree (btree)
  (let ((cur (create-cursor btree)))
    (position-cursor cur nil :kind :first :prime t)
    (let ((count 0))
      (loop
        (if* (null (cursor-next cur)) then (return))
        (incf count))
      (unbind-cursor cur)
      count)))

(create-cursor btree)
returns a cursor for the btree. The next function that should be called is position-cursor or position-cursor-ext in order to put the cursor at a particular spot in the btree.

A program need not close a cursor when it's finished but it should call unbind-cursor.





(position-cursor cursor key &key value kind prime)
If kind is nil:
If value is nil
place the cursor in the btree at the first occurrence of the given key, or, if the key isn't present, then at the first position after where the key would be placed if the key were present.

return true if the key is present in the btree, else nil.


If value is given
place the cursor at the entry position in the btree where the key and value are as passed in.

return true if a key, value entry was found else return nil. If nil is returned, the location of the cursor is just after the last item for the given key, or just after where an item for the given key would be placed.

other values for kind





(position-cursor-ext cursor key kstart kend &key value vstart vend prime)
like position-cursor except that the key to search for is the subsequence of key from kstart to kend.

Also if value is given then this acts like position-cursor with a value given. In this case vstart defaults to 0 and vend defaults to the length of value.





(cursor-get cursor &key (key t) (value t) (kind :get) prime)

the keyword arguments key and value control whether the key and value are returned (or if nil is returned in place of the key or value not returned).

If kind is





(cursor-next cursor &key (key t) (value t) prime)
advance the cursor to the next position and return two values, the key and the value, using the keyword arguments :key and :value to determine what's returned.

If the cursor was already at the last entry of the btree then nil is returned.





(cursor-previous cursor &key (key t) (value t) prime)
retreat the cursor to the previous position and return two values, the key and the value, using the keyword arguments :key and :value to determine what's returned.

If the cursor was already at the first entry of the btree then nil is returned.





(cursor-get-ext cursor &key (key t) (value t) (kind :get))
this is just like cursor-get except that six values are returned

    key kstart kend  value vstart vend
The key and value vectors returned are the buffers used by the btree code, thus they should not be modified and they are subject to being changed by subsequent btree operations.

kind can be :get, :next, :previous or :first





(cursor-next-ext cursor &key (key t) (value t))
Advance the cursor to the next entry and return values like cursor-get-ext.

Return nil if the cursor is already at the end of the btree.





(cursor-previous-ext cursor &key (key t) (value t))
Retreat the cursor to the previous entry and return values like cursor-get-ext.

Return nil if the cursor is already at the beginning of the btree.





(cursor-delete cursor &key prime)
Delete the key, value at the cursor. This causes the cursor to now point at what was the next item before the delete as the btree automatically compresses to fill in gaps from deleted objects.

prime, if true, will prime the cursor after the delete





(unbind-cursor cursor)
disassociate the cursor from a particular part of the btree. It's imporant to do this when you're done using the cursor for a while. You can also call (position-cursor nil :kind :unbind) to do this.





Miscellaneous

(copy-btree &key from to)
from and to should be open btrees. All the key value pairs from from will be inserted into to.

The insertion is done via the set-btree-ext function so the unique-keys value of the to btree will have an effect.

This function is deprecated in favor of the more general merge-btrees function.





(merge-btrees target-btree source-btrees)
All the items from the source-btrees will be stored in the target-btree. Items will be added in order (using the comparison function for the target-btree). When two source btrees have the same key the item from the btree earlier in the argument list will be inserted first. The unique-keys value for the target-btree will determine if multiple entries with the same key will end up in the target-btree or if only the last item inserted with the same key ends up in the target-btree.

If you're merging btrees to create a btree that will never be modified it makes sense to specify a split value close to 1.0 for the target-btree. This is because the target-btree will be populated with keys in ascending order and thus the new key values will always be stored in the new block after the split.





(analyze-btree btree)
test the btree for internal errors and return a structure describing the contents of the btree and the btree cache. Note that running this function will change the btree cache, so running it a second time will return different cache results





(btree-user-data btree)

This setf'able slot is a place for user code to store whatever data it wants in the btree object.





(dump-btree btree &optional filename)
write out all the data in a btree in lisp and human readable format.





Replace Hook

You can specify a hook function that gets called when a value associated with a key is going to be replaced with another value. This gives the program a chance to modify the old value to include the new value.

You specify the hook function this way:

    (setf (btree-replace-hook btree) 'my-hook-function)

Note that this hook function only gets called when a value is replaced. A value can only be replaced when the btree is created with :unique-keys t specified.

The hook function takes these seven arguments
btree
oldvalue oldvstart oldvend
newvalue newvstart newvend

where oldvalue and newvalue are (unsigned-byte 8) and vectors and the start and end values indicate a range of indicies inside those vectors (i.e. all indicies beginning with Xstart and less than Xend).

The hook function is called when the existing value (denoted by oldvalue) is going to be replaced with the new value (denoted by newvalue)

The hook function can do one of three things.

  1. Do nothing. Let the replacement go as planned. The hook function returns nil to specify this choice.
  2. Modify the old value to include the new value. The hook function can modify the oldvalue in any way it wants but only within the given vector indicies. The hook function returns t to indicate that it has already replaced the value and the btree code should do no further work on this.
  3. The hook function can specify an alternative value to store instead of newvalue. The btree code does the rest of the work. The hook function returns four values to choose this option
    nil
    altvalue - a usb8 vector
    altstart - integer
    altend - integer



Encoding

The btrees store only unsigned-byte 8 (usb8) vectors as keys and values. You'll need to encode whatever objects you choose to store as usb8 vectors. In this section we discuss various ways to encode lisp objects.

Lisp has a built-in function for encoding strings as usb8 vectors.

cl-user(7): (defun enc-string (str)
                 (string-to-octets str :external-format :utf-8 :null-terminate nil))
enc-string
cl-user(8): (enc-string "test")
#(116 101 115 116)
4
cl-user(9):

The utf-8 external format is a good general purpose encoding for any size character.

Since most lisp objects can be printed and then that printed form read by the lisp reader, one option is to print the object to encode as a string and then use string-to-octets to encode it. This is a simple way to encode objects but not particularly fast.

cl-user(9): (defun enc-value (value) (enc-string (write-to-string value)))
enc-value

cl-user(10): (enc-value '(a b c d)) #(40 97 32 98 32 99 32 100 41) 9

Decoding values is accomplished with octets-to-string:

cl-user(13): (defun dec-value (encoded)
                   (read-from-string (octets-to-string encoded :external-format :utf-8)))
dec-value

cl-user(14): (dec-value (enc-value '(a b c d))) (a b c d) 9 cl-user(15):

One word of caution: whenever you're reading forms which could have been modified by someone untrustworthy there's a chance that they introduced uses of the #. (sharp-period) reader macro to cause evaluation at read time. In that case you'll want to bind *read-eval* to nil around the user of the read function.

The enc-value function show above will work for integers but it's probably not the encoder you'll want to use.

The following function encodes a non-negative integer in a fixed sized vector in big-endian format.

(defun enc-pos-int (value bytes &optional arr)
  (let ((res nil))
    (loop
      (if* (zerop value)
         then (return)
         else (push (logand #xff value) res)
              (setq value (ash value -8))))

(if* (> bytes (length res)) then (dotimes (i (- bytes (length res))) (push 0 res)))

(if* (and arr (>= (length arr) (length res))) then (let ((i -1)) (dolist (val res) (setf (aref arr (incf i)) val)) arr) else (make-array (length res) :element-type '(unsigned-byte 8) :initial-contents res))))

Here we try it out:

cl-user(12): (enc-pos-int 10 5)
#(0 0 0 0 10)

cl-user(13): (enc-pos-int 23423410 5) #(0 1 101 105 178)

cl-user(14): (enc-pos-int (1- (expt 2 33)) 5) #(1 255 255 255 255)

If you want to do sorting using the btree's built-in comparison function then you must encode all integers into the same length vector. This does waste space for small numbers. If space is critical you could write an encoder that encoded each number in the fewest bytes and then write your own btree comparsion that understood this encoding.

This function will decode an encoded value:

(defun dec-pos-int (val &optional (start 0) (end (length val)))
  (let ((res 0))
    (dotimes (i (- end start))
      (setq res (+ (aref val (+ start i)) (ash res 8))))

res))

as demonstrated here:

cl-user(16): (dec-pos-int (enc-pos-int 21234 6))
21234

Examples

Using the encoding functions discussed in the previous section we'll show some examples of creating btrees.

Here we create a btree where we store for each number from 0 to 999 their square:

cl-user(6): (use-package :db.btree)
t
cl-user(7): (setq bt (create-btree "foo.bt"))
#<db.btree::btree [1] foo.bt @ #x1001257ff2>
cl-user(8): (dotimes (i 1000)
                 (setf (get-btree bt (enc-pos-int i 4))
                       (enc-pos-int (* i i) 5)))
nil

Now we use the btree to find the square of 25

cl-user(9): (get-btree bt (enc-pos-int 25 4))
#(0 0 0 2 113)
cl-user(10): (dec-pos-int *)
625

A cursor is an object that can move through a btree allowing you to retrieve keys and values and to delete keys and values

When you create a cursor you specify the btree that it will scan.

cl-user(12): (setq cur (create-cursor bt))
#<db.btree::cursor  @ #x71c5d62a>

When a cursor is first created it doesn't point anywhere in the btree. Thus the operations on the cursor just return nil.

cl-user(13): (cursor-get cur)
nil
cl-user(14): (cursor-next cur)
nil

You can specify where a cursor should point in a number of ways. Here we tell the cursor to point at the first element in the btree (and since keys are sorted, this will be a pointer to the lowest key in the key sorting order).

cl-user(15): (position-cursor cur nil :kind :first)
nil

We can retrieve the key and value at the cursor with cursor-get. It returns the key and value as two values:

cl-user(16): (cursor-get cur)
#(0 0 0 0)
#(0 0 0 0 0)

We can tell even without decoding these usb8 arrays that the key and value are both 0.

To advance the cursor to the next value and retrieve it you use cursor-next:

cl-user(17): (cursor-next cur)
#(0 0 0 1)
#(0 0 0 0 1)
cl-user(18): (cursor-next cur)
#(0 0 0 2)
#(0 0 0 0 4)
cl-user(19): (cursor-next cur)
#(0 0 0 3)
#(0 0 0 0 9)
cl-user(20): (cursor-next cur)
#(0 0 0 4)
#(0 0 0 0 16)

You'll note that after positioning the cursor we used cursor-get to retrieve the first value and cursor-next to retrieve subsequent ones. If you're writing a loop to retrieve values it's undesireable to have to call one function the first time through the loop and another function on subsequent calls. Thus cursors can be primed which means that they are in a special state so that a cursor-next will not move the cursor before retrieving the value. Also calling cursor-next will un-prime the cursor.

Here we once again position the cursor at the first item but this time we prime it as well. Then we can just use cursor-next to retrieve all the values:

cl-user(21): (position-cursor cur nil :kind :first :prime t)
nil
cl-user(22): (cursor-next cur)
#(0 0 0 0)
#(0 0 0 0 0)
cl-user(23): (cursor-next cur)
#(0 0 0 1)
#(0 0 0 0 1)
cl-user(24): (cursor-next cur)
#(0 0 0 2)
#(0 0 0 0 4)

You can position the cursor at the last item in the tree and then scan backwards with cursor-previous.

cl-user(25): (position-cursor cur nil :kind :last :prime t)
nil
cl-user(26): (cursor-previous cur)
#(0 0 3 231)
#(0 0 15 58 113)
cl-user(27): (dec-pos-int *)
999
cl-user(28): (cursor-previous cur)
#(0 0 3 230)
#(0 0 15 50 164)
cl-user(29): (dec-pos-int *)
998

You can position the cursor at a particular key. Note that in this case position-cursor returns t indicating that it found the key in the table:

cl-user(36): (position-cursor cur (enc-pos-int 385 4))
t
cl-user(37): (cursor-get cur)
#(0 0 1 129)
#(0 0 2 67 1)
cl-user(38): (dec-pos-int *) 385

If you specify a key not in the btree then position-cursor returns nil and the cursor is set on the next item after the location where the key would have been found. Here we choose a key value 2000 which is bigger than the biggest key in the table: 999. Thus position-cursor returns nil and sets the cursor after the end of the table. cursor-previous brings the cursor to the last key in the table: 999.

cl-user(39): (position-cursor cur (enc-pos-int 2000 4))
nil

cl-user(40): (cursor-get cur) nil

cl-user(41): (cursor-previous cur) #(0 0 3 231) #(0 0 15 58 113)

cl-user(13): (dec-pos-int *) 999

You can position the cursor at a particular key and value as well and position-cursor will return t if it found the pair and nil if it did not:

cl-user(41): (position-cursor cur (enc-pos-int 385 4)
                                  :value (enc-pos-int (* 385 385) 5))
t
cl-user(42): (position-cursor cur (enc-pos-int 385 4)
                                  :value (enc-pos-int (* 200 200) 5))
nil

You use a cursor to specify which values to delete from the btree. In the example below we position the cursor at key value 5. We delete that value at which point the cursor moves to the next key value, 6. We move the cursor back one value and we end up at key value 4 since key value 5 was deleted.

cl-user(47): (position-cursor cur (enc-pos-int 5 4))
t
cl-user(48): (cursor-get cur)
#(0 0 0 5)
#(0 0 0 0 25)
cl-user(49): (cursor-delete cur)
t
cl-user(50): (cursor-get cur)
#(0 0 0 6)
#(0 0 0 0 36)
cl-user(51): (cursor-previous cur)
#(0 0 0 4)
#(0 0 0 0 16)

When you're finished using a cursor for a while it's best to unbind it. This disassociates it from any block in the btree and this allows the cursor to be garbage collected should no references exist to the cursor from the heap.

cl-user(52): (unbind-cursor cur)
#<db.btree::cursor  @ #x71b06c82>

So far all the btree operations have been done on the copy of the btree in memory. In order to write the btree to the disk you must sync-btree:

cl-user(53): (sync-btree bt)
nil

When you're finished with the btree you should close it. The close-btree function will call sync-btree to ensure that the btree is completely written to the disk before the btree file is closed.

cl-user(54): (close-btree bt)
#<db.btree::btree [1] foo.bt @ #x1001139632>

Advanced Examples

If your btrees will have few items in them or will be rarely accessed then you can make use of the examples in the previous section to manipulate your btrees. However if you're going to be storing millions and even billions of items in your btrees or will be repeatedly accessing them then you'll want to program in a more space and time efficient manner. This section will show you how to use the btrees in the most efficient manner.

Opening a Btree

A btree is stored on the disk and parts of it are kept in a cache in the Lisp heap. When you open the btree you can specify the maximum size for the cache. The bigger the cache the more likely the disk block the btree code needs will be in the cache but the less Lisp heap that is available to the rest of the application.

Here we ask for at most a one megabyte cache. Since each btree block is 4096 bytes this cache can hold up to 256 blocks from the btree.

cl-user(16): (setq bt (create-btree "bar.bt" :cache-size (* 1024 1024)))
#<db.btree::btree [2] bar.bt @ #x1001225782>

If you don't specify a cache size then there is no limit to the size of the cache. This is ok for small btrees but if the btree is huge it could mean using up all of the Lisp heap for the btree cache.

The cache consists of lisp vectors with copies of blocks from the disk. When the cache gets larger than the limit specified some blocks are removed from the cache and allowed to be garbage collected. The garbage collection of these vectors can take some time to occur since the lisp vectors are all allocated in old space and only a global garbage collection will recover unused old space.

Rather than require a global garbage collection in order to recover and reuse a cache block you should specify that you want the btree to use a buffer pool to manage the reuse:

cl-user(18): (setq bt (create-btree "bar.bt" :cache-size (* 1024 1024)
                                             :buffer-pool t))
#<db.btree::btree [3] bar.bt @ #x1001229ed2>

If a btree uses a buffer pool then when blocks are freed from the cache they are put in the buffer pool and when a new block is needed the buffer pool is checked before allocating one from the heap.

You can even share a buffer pool between btrees.

cl-user(19): (let ((bp (make-btree-buffer-pool)))
	       (setq bt (create-btree "bar.bt" :cache-size (* 1024 1024)
                                               :buffer-pool bp))
	       (setq bt2 (create-btree "bar2.bt" :cache-size (* 1024 1024)
                                               :buffer-pool bp)))
#<db.btree::btree [5] bar2.bt @ #x1001236602>

Accessing data

Many of the functions in the btree API have two versions: a normal version (e.g. get-btree) and an extended version (e.g. get-btree-ext). The normal version is easier to use but the extended version is faster and less wasteful of space.

Keys and values in the normal version are simple vectors of type (unsigned byte 8). In the extended version keys and values are subsequences of simple vectors of type (unsigned-byte 8).

One big caveat about the use of the extended version: any returned values are only valid until the next call to a btree function on that btree. This is because the returned values are pointers to the btree buffers and on any btree call a given btree buffer may be used to store some other value.

The following example will compare the use of the normal and extended functions. We're going to open the same btree we built in the section above named "Examples." We want to find the number of keys between 0 and 199 that have a given value in the btree. First we'll show how to do this without a cursor and next we'll show how to write this using a cursor.

Using the normal functions we write it this way.

(defun run-normal (bt checknum)
  (let ((found 0))
    (dotimes (keynum 200)
      (let* ((key (enc-pos-int keynum 4))
	     (val (get-btree bt key)))
	(if* (and val
		  (equal checknum (dec-pos-int val)))
	   then (incf found))))
    found))

We're aware that a given key may not appear in the btree so we only call our dec-pos-int function if an actual value was returned. In this function we're allocating a new usb8 vector when we call our enc-pos-int function and the btree get-btree function (should there be a value at that key).

If we use the extended functions we would write it this way:

(defun run-ext (bt checknum)
  (let ((kbuf (make-array 10 :element-type '(unsigned-byte 8)))
	(found 0))
    (dotimes (keynum 200)
      (let* ((key (enc-pos-int keynum 4 kbuf)))
	(multiple-value-bind (value vstart vend)
	    (get-btree-ext bt key 0 4)
	  (if* (and value (equal checknum (dec-pos-int value vstart vend)))
	     then (incf found)))))
    found))

Since our goal is to minimize allocation we've passed a vector to enc-pos-int for it to fill with the encoded value. We call get-btree-ext and tell it the bounds with the vector that contain the key. It returns nil or a vector and the start and end indicies showing where the value is located in the returned vector. dec-pos-int is able to decode just that region of the vector.

When you run the two functions you see that the extended version does a lot less allocation (32 bytes of other bytes vrs 9600, and 802 cons bytes vrs 1802.)

cl-user(14): (time (run-normal bt 16))
; cpu time (non-gc) 10 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  10 msec user, 0 msec system
; real time  1 msec
; space allocation:
;  1,802 cons cells, 9,600 other bytes, 0 static bytes
1

cl-user(15): (time (run-ext bt 16)) ; cpu time (non-gc) 0 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 0 msec user, 0 msec system ; real time 1 msec ; space allocation: ; 802 cons cells, 32 other bytes, 0 static bytes 1 cl-user(16):

If you are scanning a btree through a range of key values it's usually more efficient to scan the values in the btree using a cursor.

Using normal functions the code is as follows. We create a cursor and set it at the first key value greater than or equal to zero. Then we scan through the btree until there are no more keys or the key is greater or equal to 200. Each time cursor-next is called two usb8 vectors must be allocated to hold the key and value vectors returned.

(defun cursor-normal (bt checknum) (let ((found 0) (cursor (create-cursor bt))) (position-cursor cursor (enc-pos-int 0 4) :prime t) (loop (multiple-value-bind (key value) (cursor-next cursor)

(if* (or (null key) (>= (dec-pos-int key) 200)) then (return))

(if* (equal checknum (dec-pos-int value)) then (incf found)))) found))

The extended function cursor-next-ext returns pointers to the btree buffers and thus doesn't need to allocate new vectors.

(defun cursor-extended (bt checknum)
  (let ((found 0)
	(cursor (create-cursor bt)))
    (position-cursor cursor (enc-pos-int 0 4) :prime t)
    (loop
      (multiple-value-bind (key kstart kend value vstart vend)
	  (cursor-next-ext cursor)
	(if* (or (null key)
		 (>= (dec-pos-int key kstart kend) 200))
	   then (return))
	(if* (equal checknum (dec-pos-int value vstart vend))
	   then (incf found))))
    found))

The difference in space allocation in the two cases is apparent from the time macro:

cl-user(23): (time (cursor-normal bt 16))
; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  0 msec user, 0 msec system
; real time  1 msec
; space allocation:
;  12 cons cells, 9,760 other bytes, 0 static bytes
1

cl-user(24): (time (cursor-extended bt 16)) ; cpu time (non-gc) 0 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 0 msec user, 0 msec system ; real time 1 msec ; space allocation: ; 12 cons cells, 64 other bytes, 0 static bytes 1 cl-user(25):

Large Btrees and Merging

Btrees can exhibit poor performance when the size of the btree on disk is much larger than the size of memory cache and random keys are accessed. When this occurs most of the time in the btree code is spent waiting to read a block from the disk.

This test program illustrates the problem. In this test we build a btree of 10,000,000 randomly chosen keys. We allocate a 1 MB cache and the btree file grows to 175 MB when it reaches 10,000,000 keys.

(defun build-random-btree (size)
  (let ((bt (create-btree "random.bt"
			  :cache-size (* 1024 1024)
			  :buffer-pool t))
	(kbuf (make-array 10 :element-type '(unsigned-byte 8)))
	(value (enc-pos-int 222 3))
	(last-time (get-universal-time))
	(key))
    (dotimes (i size)
      (setq key (enc-pos-int (random 10000000) 4 kbuf))
      (set-btree-ext bt key 0 4 value 0 (length value))
      (if* (zerop (mod (1+ i) 1000000))
	 then (sync-btree bt)
	      (let ((time (get-universal-time)))
		(format t "~12d  ~d secs~%" (1+ i) (- time last-time))
		(setq last-time time))))
    (close-btree bt)))

In the test program we print the amount of time it took for each 1,000,000 keys we added to the btree. The time printed isn't cumulative - it's just the time to insert the most recent 1,000,0000 entries.

cl-user(4): (build-random-btree 10000000)
     1000000  24 secs
     2000000  76 secs
     3000000  130 secs
     4000000  165 secs
     5000000  254 secs
     6000000  291 secs
     7000000  395 secs
     8000000  344 secs
     9000000  445 secs
    10000000  496 secs
#<db.btree::btree [1] random.bt @ #x10012f48b2>
cl-user(5):

It's pretty clear that the cost per insert is growing quite a bit as the btree grows.

There are a few ways to ease this problem. One would be to allocate as big a cache as you can when you're doing a large number of inserts.

The second method is to sort the keys before you insert them. Then instead of randomly inserting into the btree you'll insert in a way that the btree code is more likely to find the block it needs in the cache.

You can sort the keys in a variety of ways, one of them to use another btree. If you put the new keys in an empty btree they can be inserted quickly. Then you can use the btree merge function to add them to the big btree you're building.

This procedure is shown in this function, where we put each 1,000,000 keys in a btree and then merge it into the big btree. We could put the first 1,000,000 keys in the destination btree right away but for simplicity we insert the first 1,000,000 keys just like the other 9,000,000 keys.

(defun build-merge-random-btree (millions)
  (let* ((bp (make-btree-buffer-pool))
	 (bt (create-btree "random.bt"
			   :cache-size (* 1024 1024)
			   :buffer-pool bp))
	 (kbuf (make-array 10 :element-type '(unsigned-byte 8)))
	 (value (enc-pos-int 222 3))
	 (last-time (get-universal-time))
	 (key))
    (dotimes (i millions)
      (let ((newbt (create-btree "scratch.bt"
				 :cache-size (* 40 1024 1024)
				 :buffer-pool bp
				 :partial-writes t)))
	(dotimes (j 1000000)
		(setq key (enc-pos-int (random 10000000) 4 kbuf))
		(set-btree-ext newbt key 0 4 value 0 (length value)))
	(merge-btrees bt (list newbt))
	(close-btree newbt :abort t))
      (sync-btree bt)
      (let ((time (get-universal-time)))
	(format t "~12d  ~d secs~%" (* (1+ i) 1000000)
		(- time last-time))
	(setq last-time time)))))

Here are the timing for this method. It's dramatically better than the first method. In fact we barely see a slow down as we reach 10,000,000 keys.

cl-user(4): (build-merge-random-btree 10)
     1000000  11 secs
     2000000  16 secs
     3000000  16 secs
     4000000  16 secs
     5000000  17 secs
     6000000  17 secs
     7000000  17 secs
     8000000  17 secs
     9000000  17 secs
    10000000  18 secs
nil
cl-user(5):

Index

analyze-btree
btree-user-data
close-btree
copy-btree
create-btree
create-cursor
create-memory-btree
cursor-delete
cursor-get
cursor-get-ext
cursor-next
cursor-next-ext
cursor-previous
cursor-previous-ext
dump-btree
get-btree
get-btree-ext
merge-btrees
open-btree
position-cursor
position-cursor-ext
set-btree-ext
sync-btree
unbind-cursor